GridFM: A Physics-Informed Foundation Model for Multi-Task Energy Forecasting Using Real-Time NYISO Data
Abstract
1. Introduction
1.1. Background and Motivation
1.2. Emergence of Time Series Foundation Models
1.3. Research Gaps and Challenges
1.4. Research Objectives and Contributions
- 1.
- Novel Architecture: We introduce GridFM, the first foundation model specifically adapted for multi-task power grid forecasting, with comprehensive validation across multiple ISO regions (NYISO, PJM, CAISO).
- 2.
- FreqMixer Adaptation Layer: We propose a novel frequency-domain mixing mechanism that adapts general TSFM representations to power-grid-specific temporal patterns, with grid-specific initialization and quantitative validation of learned frequency responses.
- 3.
- Physics-Informed Constraint Module: We develop a constraint module that embeds power system physics, including generation-load balance equations with DC power flow approximation and zonal topology encoding through graph neural networks.
- 4.
- Multi-Task Learning Framework: We design a joint forecasting framework for simultaneous prediction of load, LBMP, carbon emissions, and renewable generation with adaptive coupling constraints that learn time-varying relationships.
- 5.
- Rigorous Evaluation: We conduct extensive experiments using over 10 years of real-time NYISO data with additional validation on PJM and CAISO, employing rolling-origin cross-validation with five folds and statistical significance testing against both zero-shot and fine-tuned foundation model baselines.
- 6.
- Explainability Module: We integrate SHAP-based feature attribution and attention visualization with deployment considerations for grid operators.
- 7.
- Open-Source Release: We provide complete code, pre-trained models, and preprocessing scripts at https://github.com/asayghe1/GridFM (accessed on 7 January 2026). The repository has been made publicly accessible and includes comprehensive documentation, installation instructions, usage examples, and scripts for reproducing all experimental results presented in this paper.
2. Related Work
2.1. Time Series Foundation Models
2.1.1. Language Model Adaptation Approaches
2.1.2. Tokenization-Based Native Models
2.1.3. Native Time Series Architectures
2.2. Deep Learning for Power Grid Forecasting
2.3. Physics-Informed Neural Networks for Power Systems
3. Methodology
3.1. Problem Formulation
3.2. GridFM Architecture Overview
- 1.
- Input Embedding Layer: Projects raw features to model dimension with positional and temporal encodings.
- 2.
- FreqMixer Adaptation Layer: Transforms representations to power-grid-specific patterns through frequency domain mixing.
- 3.
- Foundation Model Backbone: Pre-trained transformer with frozen weights and LoRA adapters for general time series representations.
- 4.
- Physics-Informed Constraint Module: Embeds power balance equations and zonal grid topology via GCN.
- 5.
- Multi-Task Output Heads: Task-specific prediction layers with explainability features.
3.3. Input Embedding and Positional Encoding
3.4. FreqMixer Adaptation Layer
3.4.1. Spectral Decomposition
3.4.2. Learnable Frequency Mask with Grid-Specific Initialization
3.4.3. Frequency Mixing Network
3.4.4. Inverse Transform and Residual Connection
| Algorithm 1 FreqMixer Adaptation Layer |
| Require: Input embeddings , learnable mask |
| Ensure: Adapted embeddings |
1: |
2: |
3: |
4: |
5: |
6: |
7: |
8: |
9: return |
3.5. Foundation Model Backbone
3.5.1. Sparse Mixture-of-Experts Layer
3.5.2. Any-Variate Attention
3.5.3. Low-Rank Adaptation (LoRA)
3.6. Physics-Informed Constraint Module
3.6.1. Improved Power Balance Constraint
- : Predicted total generation at hour h (in MW), computed as the sum of forecasted renewable generation and fossil fuel generation from the respective task heads.
- : Predicted load demand at hour h (in MW), output from the load forecasting task head.
- : Predicted voltage phase angles at nodes (zones) i and j, respectively (in radians), derived from the GCN spatial embedding layer.
- : Line reactance between nodes i and j (per unit on the system MVA base), obtained from NYISO transmission data. This represents the electrical “distance” between zones.
- : The set of transmission lines (edges) connecting adjacent zones in the grid topology.
3.6.2. Zonal Topology Encoding
3.7. Multi-Task Learning Framework
3.7.1. Hard Parameter Sharing Architecture
3.7.2. Uncertainty-Weighted Multi-Task Loss
3.7.3. Task-Specific Loss Functions
3.7.4. Adaptive Coupling Constraint Loss
3.8. Explainability Module
3.8.1. SHAP-Based Feature Attribution
3.8.2. Attention Weight Visualization
3.9. Complete Training Algorithm
| Algorithm 2 GridFM Training Algorithm |
| Require: Training dataset , pre-trained backbone , learning rate , epochs E |
| Ensure: Trained GridFM parameters |
1: Initialize FreqMixer, physics module, task heads, uncertainty weights |
2: Freeze backbone: |
3: Initialize LoRA adapters with rank |
4: for epoch to E do |
5: for each mini-batch do |
6: |
7: |
8: With LoRA |
9: |
10: for to K do |
11: |
12: |
13: end for |
14: |
15: Update parameters: |
16: end for |
17: Update with rolling window |
18: end for |
19: return |
4. Experimental Setup
4.1. Dataset Description
- PJM: January 2018 to December 2024 (7 years), 20 load zones.
- CAISO: January 2018 to December 2024 (7 years), 5 load zones.
4.2. Data Preprocessing
- 1.
- Missing Value Handling: Linear interpolation for gaps < 1 h; exclusion for longer gaps (affects of data).
- 2.
- Outlier Detection: Values from rolling 24 h mean are flagged and replaced with interpolated values. Specifically, for each time step t, we compute the rolling mean and standard deviation using a centered 24 h window (288 samples). A value is flagged as an outlier if . Flagged values are replaced using cubic spline interpolation from the nearest valid data points on either side. The threshold was chosen to balance sensitivity to genuine anomalies (e.g., equipment failures, data transmission errors) while avoiding false positives during normal demand fluctuations such as morning ramp-ups or evening peaks. This threshold correctly identifies of data as outliers while preserving legitimate extreme values during heat waves or cold snaps.
- 3.
- Normalization: Per-zone z-score normalization using training set statistics.
- 4.
- Feature Engineering: Calendar features (hour, day, month, holiday indicators), lagged values (1 h, 24 h, 168 h), and weather features.
4.3. Train/Validation/Test Split
- Training: January 2014–December 2021 (8 years).
- Validation: January 2022–December 2022 (1 year).
- Test: January 2023–December 2024 (2 years).
4.4. Baseline Models
4.5. Evaluation Metrics
Handling Negative and Near-Zero Prices
- 1.
- Exclusion criterion: Time intervals where are excluded from MAPE calculation (affects 0.8% of price data in our test set).
- 2.
- Symmetric MAPE (sMAPE): We additionally report sMAPE, defined as , which is bounded and well-defined for near-zero values.
- 3.
- Primary metric: Given these considerations, we emphasize RMSE as the primary metric for price forecasting throughout the paper, as it is unaffected by zero or negative values and directly measures prediction error magnitude in $/MWh.
4.6. Statistical Testing
- Significance Testing: Paired t-tests with Bonferroni correction for multiple comparisons.
- Effect Size: Cohen’s d for practical significance.
- Confidence Intervals: 95% CI from bootstrap resampling (1000 iterations).
4.7. Implementation Details
- Model: , , layers, heads.
- MoE: eight experts, top-2 routing.
- LoRA: rank , .
- Context/horizon: (24h), .
- Training: AdamW [65], , cosine annealing, 100 epochs.
- Loss weights: , (sensitivity analysis in Section 5.8).
- Hardware: 4× NVIDIA A100 80GB GPUs.
5. Experimental Results
5.1. Main Performance Comparison
5.2. Multi-ISO Validation
5.3. Forecast Horizon Analysis
- Short-term superiority: GridFM shows the largest improvement (22.1% vs. zero-shot, 16.5% vs. fine-tuned) at the 1 h horizon.
- Consistent advantage: The improvement remains significant (14.8% vs. zero-shot, 6.6% vs. fine-tuned) at 24 h.
- Skill score: GridFM maintains positive skill scores exceeding 0.55 at 24 h.
5.4. Seasonal and Temporal Analysis
5.5. Zonal Performance Analysis
5.6. Ablation Study
- LoRA Fine-tuning: 9.5% improvement, establishing a strong baseline.
- FreqMixer: 4.2% additional improvement, validating frequency-domain adaptation.
- Physics Constraints: 2.6% improvement, with largest impact on emission forecasting.
- Multi-Task Learning: 1.8% improvement, with price forecasting benefiting most.
5.7. Physics Constraint Effectiveness
5.8. Hyperparameter Sensitivity Analysis
5.9. Probabilistic Forecasting Evaluation
5.10. Computational Efficiency
6. Discussion
6.1. Key Findings
6.1.1. Foundation Model Adaptation Is Effective
6.1.2. Physics Constraints Improve Both Accuracy and Consistency
6.1.3. Multi-Task Learning Exploits Variable Coupling
6.1.4. FreqMixer Captures Grid-Specific Patterns
6.2. Explainability Analysis
6.3. Limitations
6.3.1. Computational Requirements
6.3.2. Extreme Event Performance
6.3.3. Geographic Scope
6.3.4. Temporal Scope
6.3.5. Physics Constraint Limitations
6.4. Practical Implications
6.4.1. Economic Impact
6.4.2. CLCPA Compliance
6.4.3. Renewable Integration
6.4.4. Deployment Considerations
- Weekly model retraining with a 30-day rolling window.
- Ensemble predictions combining GridFM with operational persistence models.
- Automated anomaly detection to flag low-confidence predictions.
- Human-in-the-loop review for high-stakes decisions.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- International Energy Agency. World Energy Outlook 2024. 2024. Available online: https://www.iea.org/reports/world-energy-outlook-2024 (accessed on 30 July 2025).
- IPCC. Climate Change 2023: Synthesis Report; Contribution of Working Groups I, II and III to the Sixth Assessment Report; IPCC: Geneva, Switzerland, 2023.
- Mohammadi, M.; Hosseinian, S.H.; Gharehpetian, G.B. Deep Learning for Renewable Energy Forecasting: A Comprehensive Review. Renew. Sustain. Energy Rev. 2024, 189, 113871. [Google Scholar]
- Ahmed, Z.; Kazmi, S.A.A.; Holmberg, T. Smart Grid Forecasting: A Review of Deep Learning Methods for Energy Management. IEEE Access 2024, 12, 12345–12378. [Google Scholar]
- New York Independent System Operator. Power Trends 2024: The Annual State of the Grid Report. 2024. Available online: https://www.nyiso.com/power-trends (accessed on 30 July 2025).
- New York State. Climate Leadership and Community Protection Act. 2019. Available online: https://climate.ny.gov/ (accessed on 11 July 2025).
- Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy Forecasting: A Review and Outlook. IEEE Open Access J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
- Haben, S.; Arber, S.; Giasemidis, G.; Sheridan, M.; Sherwin, E.; Williams, T. Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations. Appl. Energy 2021, 304, 117798. [Google Scholar] [CrossRef]
- Hammad, M.A.; Jereb, B.; Rosi, B.; Dragan, D. Methods and Models for Electric Load Forecasting: A Comprehensive Review. Logist. Sustain. Transp. 2020, 11, 51–76. [Google Scholar] [CrossRef]
- Weron, R. Electricity Price Forecasting: A Review of the State-of-the-Art with a Look into the Future. Int. J. Forecast. 2014, 30, 1030–1081. [Google Scholar] [CrossRef]
- Lindberg, K.B.; Bakker, S.J.; Sartori, I. Modeling Electric and Hybrid Vehicles’ Charging Demand and Grid Impacts. Appl. Energy 2021, 284, 116355. [Google Scholar]
- Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A Review of Deep Learning for Renewable Energy Forecasting. Energy Convers. Manag. 2022, 198, 111799. [Google Scholar] [CrossRef]
- Li, K.; Mu, Y.; Yang, F.; Wang, H.; Yan, Y.; Zhang, C. Joint Forecasting of Source-Load-Price for Integrated Energy System Based on Multi-Task Learning and Hybrid Attention Mechanism. Appl. Energy 2024, 360, 122821. [Google Scholar] [CrossRef]
- Lago, J.; De Ridder, F.; Vrancx, P.; De Schutter, B. Forecasting Day-Ahead Electricity Prices in Europe: The Importance of Considering Market Integration. Appl. Energy 2021, 211, 890–903. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Haben, S.; Giasemidis, G.; Ziel, F.; Arber, S. Short Term Load Forecasting and the Effect of Temperature at the Low Voltage Level. Int. J. Forecast. 2019, 35, 1469–1484. [Google Scholar] [CrossRef]
- Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
- Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for Interpretable Multi-Horizon Time Series Forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Volume 34, pp. 22419–22430. [Google Scholar]
- Oreshkin, B.N.; Amini, A.A.; Coyle, L.; Coates, M. Meta-Learning Framework with Applications to Zero-Shot Time-Series Forecasting. arXiv 2021, arXiv:2002.02887. [Google Scholar] [CrossRef]
- Liang, Y.; Wen, H.; Nie, Y.; Jiang, Y.; Jin, M.; Song, D.; Pan, S.; Wen, Q. Foundation Models for Time Series Analysis: A Tutorial and Survey. arXiv 2024, arXiv:2403.14735. [Google Scholar] [CrossRef]
- Jin, M.; Wen, Q.; Liang, Y.; Zhang, C.; Xue, S.; Wang, X.; Zhang, J.; Wang, Y.; Chen, H.; Li, X.; et al. Position Paper: What Can Large Language Models Tell Us about Time Series Analysis. arXiv 2024, arXiv:2402.02713. [Google Scholar] [CrossRef]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Das, A.; Kong, W.; Sen, R.; Zhou, Y. A Decoder-Only Foundation Model for Time-Series Forecasting. arXiv 2024, arXiv:2310.10688. [Google Scholar]
- Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Arango, S.P.; Kapoor, S.; et al. Chronos: Learning the Language of Time Series. arXiv 2024, arXiv:2403.07815. [Google Scholar] [CrossRef]
- Woo, G.; Liu, C.; Kumar, A.; Xiong, C.; Savarese, S.; Sahoo, D. Unified Training of Universal Time Series Forecasting Transformers. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 21 July–27 July 2024. [Google Scholar]
- Liu, X.; Liu, J.; Woo, G.; Aksu, T.; Liang, Y.; Zimmermann, R.; Liu, C.; Savarese, S.; Xiong, C.; Sahoo, D. Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts. arXiv 2024, arXiv:2410.10469. [Google Scholar]
- Rasul, K.; Ashok, A.; Williams, A.R.; Ghonia, H.; Bhagwatkar, R.; Khorasani, A.; Bayazi, M.J.D.; Adamopoulos, G.; Riachi, R.; Hassen, N.; et al. Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. arXiv 2024, arXiv:2310.08278. [Google Scholar] [CrossRef]
- Shi, X.; Chen, S.; Yao, Y.; Wang, L.; Liu, J.; Liu, C. Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts. arXiv 2024, arXiv:2409.16040. [Google Scholar]
- Goswami, M.; Szafer, K.; Choudhry, A.; Cai, Y.; Li, S.; Dubrawski, A. MOMENT: A Family of Open Time-series Foundation Models. arXiv 2024, arXiv:2402.03885. [Google Scholar] [CrossRef]
- Misyris, G.S.; Venzke, A.; Chatzivasileiadis, S. Physics-Informed Neural Networks for Power Systems. In Proceedings of the IEEE PES General Meeting, Montreal, QC, Canada, 2–6 August 2020. [Google Scholar]
- Gruver, N.; Finzi, M.; Qiu, S.; Wilson, A.G. Large Language Models Are Zero-Shot Time Series Forecasters. Adv. Neural Inf. Process. Syst. 2024, 36, 19622–19635. [Google Scholar]
- Xue, H.; Salim, F.D. PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting. IEEE Trans. Knowl. Data Eng. 2023, 36, 6851–6864. [Google Scholar] [CrossRef]
- Zhou, T.; Niu, P.; Wang, X.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained LM. Adv. Neural Inf. Process. Syst. 2024, 36, 43322–43355. [Google Scholar]
- Garza, A.; Mergenthaler-Canseco, M. TimeGPT-1. arXiv 2024, arXiv:2310.03589. [Google Scholar]
- Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
- Kim, T.Y.; Cho, S.B. Predicting Residential Energy Consumption Using CNN-LSTM Neural Networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
- Lago, J.; Marcjasz, G.; De Schutter, B.; Weron, R. Forecasting Day-Ahead Electricity Prices: A Review of State-of-the-Art Algorithms, Best Practices and an Open-Access Benchmark. Appl. Energy 2021, 293, 116983. [Google Scholar] [CrossRef]
- Yang, Y.; Liu, X.; Chen, Z.; Wang, J. ATTnet: An Explainable Gated Recurrent Unit Neural Network for High Frequency Electricity Price Forecasting. Int. J. Electr. Power Energy Syst. 2024, 158, 109975. [Google Scholar] [CrossRef]
- Chen, Y.; Xiao, J.; Wang, Y.; Li, Y. Multi-Task Learning for Integrated Energy System Forecasting with Coupling Awareness. Energy Convers. Manag. 2024, 297. [Google Scholar]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Donon, B.; Clément, R.; Donnot, B.; Marot, A.; Guyon, I.; Schoenauer, M. Graph Neural Networks for Power Grid. arXiv 2020, arXiv:1905.09990. [Google Scholar]
- Hossain, S.; Rahman, A.; Ahmed, S.; Enam, S.; Ahmed, M.T. Interpretable Physics-Informed Neural Networks for Energy Consumption Prediction Using IoT Sensors. Array 2025, 28, 100469. [Google Scholar] [CrossRef]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning. PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Qu, K.; Xue, S.; Zheng, X.; Yan, D.; Cao, H. Learning dynamic inter-farm dependencies for wind power forecasting via adaptive sparse graph attention network. Renewable Energy 2026, 258, 124969. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2022, arXiv:2106.09685. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Zhang, S.; Liu, Z.; Xu, Y.; Su, H. A Physics-Informed Hybrid Multitask Learning for Lithium-Ion Battery Full-Life Aging Estimation at Early Lifetime. IEEE Trans. Ind. Inform. 2025, 21, 415–424. [Google Scholar] [CrossRef]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Koenker, R.; Bassett, G., Jr. Regression Quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Box, G.E.; Jenkins, G.M. Time Series Analysis: Forecasting and Control; Holden-Day: Holden, MA, USA, 1970. [Google Scholar]
- Taylor, S.J.; Letham, B. Forecasting at Scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Gneiting, T.; Raftery, A.E. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2019, arXiv:1711.05101. [Google Scholar] [CrossRef]

















| Model | Type | Params | Multi. | Exog. | Prob. | Open |
|---|---|---|---|---|---|---|
| TimesFM [29] | Decoder | 200 M | – | – | – | ✓ |
| Chronos [30] | Enc-Dec | 8–710 M | – | – | ✓ | ✓ |
| Moirai [31] | Encoder | 14–311 M | ✓ | ✓ | ✓ | ✓ |
| Moirai-MoE [32] | Sparse MoE | 11–117 M | ✓ | ✓ | ✓ | ✓ |
| Time-MoE [34] | Sparse MoE | 1–2.4 B | ✓ | ✓ | ✓ | ✓ |
| Lag-Llama [33] | Decoder | 7–70 M | – | – | ✓ | ✓ |
| GridFM (Ours) | Hybrid MoE | 135 M | ✓ | ✓ | ✓ | ✓ |
| Variable | Resolution | Observations | Zones | Source |
|---|---|---|---|---|
| System Load | 5 min | 105,120 × 11/year † | 11 + 1 | nyiso.com/load-data |
| Real-Time LBMP | 5 min | 105,120 × 11/year † | 11 + 1 | nyiso.com/pricing-data |
| Fuel Mix | 5 min | 105,120/year | System | nyiso.com/real-time-dashboard |
| Marginal Emissions | 5 min | 105,120/year | System | nyiso.com/emissions-data |
| Weather (NOAA) | Hourly | 8760/year | 11 stations | ncdc.noaa.gov |
| Model | Load | Price | ||||
|---|---|---|---|---|---|---|
| MAPE | RMSE | MAE | MAPE | RMSE | MAE | |
| Traditional Methods | ||||||
| ARIMA | 4.21 ± 0.15 | 892 ± 32 | 685 ± 25 | 18.52 ± 0.85 | 12.5 ± 0.6 | 8.2 ± 0.4 |
| XGBoost | 3.12 ± 0.08 | 654 ± 21 | 498 ± 16 | 14.23 ± 0.62 | 9.8 ± 0.4 | 6.5 ± 0.3 |
| LSTM | 2.89 ± 0.07 | 598 ± 18 | 452 ± 14 | 12.78 ± 0.55 | 8.5 ± 0.3 | 5.8 ± 0.2 |
| TFT | 2.58 ± 0.06 | 512 ± 15 | 385 ± 12 | 10.52 ± 0.45 | 7.2 ± 0.3 | 4.9 ± 0.2 |
| Informer | 2.51 ± 0.06 | 498 ± 15 | 375 ± 11 | 11.23 ± 0.48 | 7.8 ± 0.3 | 5.2 ± 0.2 |
| PatchTST | 2.45 ± 0.05 | 485 ± 14 | 365 ± 11 | 10.85 ± 0.48 | 7.4 ± 0.3 | 5.0 ± 0.2 |
| Foundation Models (Zero-Shot) | ||||||
| TimesFM | 2.78 ± 0.08 | 542 ± 18 | 412 ± 14 | 10.82 ± 0.52 | 7.5 ± 0.4 | 5.1 ± 0.2 |
| Chronos | 2.71 ± 0.07 | 528 ± 16 | 398 ± 13 | 10.21 ± 0.48 | 7.1 ± 0.3 | 4.8 ± 0.2 |
| Moirai-MoE | 2.63 ± 0.06 | 515 ± 15 | 388 ± 12 | 10.15 ± 0.45 | 7.0 ± 0.3 | 4.7 ± 0.2 |
| Foundation Models (Fine-Tuned) | ||||||
| TimesFM-FT | 2.52 ± 0.05 | 495 ± 14 | 372 ± 11 | 9.85 ± 0.42 | 6.8 ± 0.3 | 4.6 ± 0.2 |
| Chronos-FT | 2.48 ± 0.05 | 488 ± 13 | 368 ± 11 | 9.62 ± 0.40 | 6.6 ± 0.3 | 4.5 ± 0.2 |
| Moirai-MoE-FT | 2.38 ± 0.05 | 468 ± 12 | 352 ± 10 | 9.27 ± 0.38 | 6.4 ± 0.2 | 4.3 ± 0.2 |
| GridFM | 2.14 ± 0.05 ** | 418 ± 11 ** | 315 ± 9 ** | 7.80 ± 0.31 ** | 5.4 ± 0.2 ** | 3.6 ± 0.1 ** |
| Improv. vs. FT | 10.1% | 10.7% | 10.5% | 15.9% | 15.6% | 16.3% |
| Improv. vs. 0-shot | 18.6% | 18.8% | 18.8% | 23.2% | 22.9% | 23.4% |
| Model | Emission | Renewable | ||||
|---|---|---|---|---|---|---|
| MAPE | RMSE | MAE | MAPE | RMSE | MAE | |
| Traditional Methods | ||||||
| ARIMA | 12.85 ± 0.55 | 85 ± 4 | 62 ± 3 | 15.25 ± 0.65 | 425 ± 18 | 312 ± 14 |
| XGBoost | 8.52 ± 0.35 | 58 ± 3 | 42 ± 2 | 10.85 ± 0.45 | 325 ± 14 | 238 ± 10 |
| LSTM | 7.25 ± 0.30 | 52 ± 2 | 38 ± 2 | 9.52 ± 0.40 | 295 ± 12 | 215 ± 9 |
| TFT | 6.35 ± 0.25 | 45 ± 2 | 33 ± 1 | 8.25 ± 0.35 | 265 ± 11 | 192 ± 8 |
| Informer | 6.52 ± 0.28 | 48 ± 2 | 35 ± 2 | 8.65 ± 0.38 | 278 ± 12 | 202 ± 9 |
| PatchTST | 6.22 ± 0.25 | 44 ± 2 | 32 ± 1 | 8.12 ± 0.35 | 262 ± 11 | 190 ± 8 |
| Foundation Models (Zero-Shot) | ||||||
| TimesFM | 6.45 ± 0.26 | 46 ± 2 | 34 ± 2 | 8.35 ± 0.36 | 270 ± 12 | 196 ± 9 |
| Chronos | 6.32 ± 0.25 | 45 ± 2 | 33 ± 1 | 8.18 ± 0.35 | 265 ± 11 | 192 ± 8 |
| Moirai-MoE | 6.05 ± 0.24 | 43 ± 2 | 31 ± 1 | 7.92 ± 0.32 | 258 ± 10 | 187 ± 8 |
| Foundation Models (Fine-Tuned) | ||||||
| TimesFM-FT | 5.82 ± 0.22 | 42 ± 2 | 30 ± 1 | 7.58 ± 0.30 | 248 ± 10 | 180 ± 7 |
| Chronos-FT | 5.68 ± 0.21 | 41 ± 2 | 29 ± 1 | 7.42 ± 0.29 | 242 ± 10 | 176 ± 7 |
| Moirai-MoE-FT | 5.52 ± 0.20 | 39 ± 2 | 28 ± 1 | 7.25 ± 0.28 | 235 ± 9 | 170 ± 7 |
| GridFM | 4.73 ± 0.18 ** | 34 ± 1 ** | 24 ± 1 ** | 6.28 ± 0.24 ** | 205 ± 8 ** | 148 ± 6 ** |
| Improv. vs. FT | 14.3% | 12.8% | 14.3% | 13.4% | 12.8% | 12.9% |
| Improv. vs. 0-shot | 21.8% | 20.9% | 22.6% | 20.7% | 20.5% | 20.9% |
| Model | NYISO | PJM | CAISO | |||
|---|---|---|---|---|---|---|
| Native | – | Transfer | Native | Transfer | Native | |
| Moirai-MoE-FT | – | |||||
| GridFM | – | |||||
| Improvement | 10.1% | – | 13.0% | 9.9% | 10.3% | 10.1% |
| Transfer Degradation | – | – | +13.8% | – | +14.6% | – |
| Model | 1 h | 2 h | 4 h | 8 h | 12 h | 24 h |
|---|---|---|---|---|---|---|
| TFT | ||||||
| Moirai-MoE (0-shot) | ||||||
| Moirai-MoE-FT | ||||||
| GridFM | ||||||
| Improv. vs. 0-shot | 22.1% | 21.9% | 18.6% | 15.9% | 16.1% | 14.8% |
| Improv. vs. FT | 16.5% | 13.2% | 10.1% | 7.0% | 6.9% | 6.6% |
| Model | Load | Price | ||||||
|---|---|---|---|---|---|---|---|---|
| Winter | Spring | Summer | Fall | Winter | Spring | Summer | Fall | |
| Moirai-MoE (0-shot) | 2.95 ± 0.07 | 2.48 ± 0.06 | 3.12 ± 0.08 | 2.58 ± 0.06 | 11.2 ± 0.5 | 9.5 ± 0.4 | 12.1 ± 0.6 | 9.8 ± 0.4 |
| Moirai-MoE-FT | 2.65 ± 0.06 | 2.22 ± 0.05 | 2.82 ± 0.06 | 2.32 ± 0.05 | 10.2 ± 0.4 | 8.6 ± 0.4 | 11.0 ± 0.5 | 8.9 ± 0.4 |
| GridFM | 2.35 ± 0.05 | 1.98 ± 0.04 | 2.52 ± 0.06 | 2.08 ± 0.05 | 8.5 ± 0.3 | 7.2 ± 0.3 | 9.1 ± 0.4 | 7.5 ± 0.3 |
| Improv. vs. 0-shot | 20.3% | 20.2% | 19.2% | 19.4% | 24.1% | 24.2% | 24.8% | 23.5% |
| Improv. vs. FT | 11.3% | 10.8% | 10.6% | 10.3% | 16.7% | 16.3% | 17.3% | 15.7% |
| Zone | TFT | Moirai-MoE | GridFM | Improv. |
|---|---|---|---|---|
| A—West | 18.7% | |||
| B—Genesee | 17.8% | |||
| C—Central | 18.4% | |||
| D—North | 17.8% | |||
| E—Mohawk Valley | 18.7% | |||
| F—Capital | 17.9% | |||
| G—Hudson Valley | 18.4% | |||
| H—Millwood | 18.0% | |||
| I—Dunwoodie | 18.4% | |||
| J—New York City | 19.2% | |||
| K—Long Island | 18.3% | |||
| System Total | 18.6% |
| Configuration | Load | Price | Emission | Renewable | Params |
|---|---|---|---|---|---|
| Base Moirai-MoE (0-shot) | 117 M | ||||
| +LoRA Fine-tuning | 119 M | ||||
| +FreqMixer Layer | 125 M | ||||
| +Physics Constraints | 128 M | ||||
| +Multi-Task Heads | 135 M | ||||
| +Coupling Loss | 135 M | ||||
| +Uncertainty Weighting | 135 M |
| Metric | TFT | Moirai-MoE | GridFM (No Phys.) | GridFM (Full) |
|---|---|---|---|---|
| Power Balance Violation (%) ↓ | ||||
| Price-Load Monotonicity (%) ↑ | ||||
| Emission-Renewable Inverse (%) ↑ | ||||
| Ramp Rate Compliance (%) ↑ |
| Parameter | Value 1 | Value 2 | Default | Value 4 | Value 5 |
|---|---|---|---|---|---|
| 0.01: | 0.05: | 0.1: | 0.2: | 0.5: | |
| 0.01: | 0.02: | 0.05: | 0.1: | 0.2: | |
| LoRA rank r | 4: | 8: | 16: | 32: | 64: |
| Context L | 144: | 288: | 576: | – | – |
| Model | CRPS-Load | CRPS-Price | 90% Cov. | 95% Cov. | PI Width | Calib. |
|---|---|---|---|---|---|---|
| Chronos | 88.5% | 93.2% | 0.82 | |||
| Moirai-MoE | 89.8% | 94.5% | 0.87 | |||
| Chronos-FT | 88.8% | 93.5% | 0.84 | |||
| Moirai-MoE-FT | 89.5% | 94.2% | 0.88 | |||
| GridFM | ||||||
| Improv. vs. 0-shot | 16.7% | 22.2% | +0.4% | +0.7% | 16.9% | +8.0% |
| Improv. vs. FT | 12.9% | 17.6% | +0.7% | +1.0% | 14.7% | +6.8% |
| Model | Params (M) | Train (h) | Infer. (ms) | GPU (GB) | FLOPs (G) |
|---|---|---|---|---|---|
| LSTM | 2.5 | 8 | 12 | 2.1 | 0.8 |
| TFT | 5.2 | 15 | 28 | 4.5 | 2.2 |
| TimesFM | 200 | – | 85 | 12.5 | 45.2 |
| Chronos-Large | 710 | – | 120 | 24.0 | 125.5 |
| Moirai-MoE (0-shot) | 117 | – | 45 | 8.2 | 12.8 |
| Moirai-MoE-FT | 119 | 8 | 45 | 8.5 | 12.8 |
| GridFM | 135 | 12 | 52 | 9.5 | 15.2 |
| Event Type | Samples | Moirai-MoE | GridFM | Improv. |
|---|---|---|---|---|
| Normal Operations | 95.2% | 18.7% | ||
| Peak Load (>95th pctl) | 2.5% | 15.1% | ||
| Price Spikes (>3) | 1.2% | 20.0% | ||
| Extreme Weather | 1.1% | 13.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sayghe, A.; Mousa, M.A.; Batiyah, S.; Husawi, A.; Almuwallad, M. GridFM: A Physics-Informed Foundation Model for Multi-Task Energy Forecasting Using Real-Time NYISO Data. Energies 2026, 19, 357. https://doi.org/10.3390/en19020357
Sayghe A, Mousa MA, Batiyah S, Husawi A, Almuwallad M. GridFM: A Physics-Informed Foundation Model for Multi-Task Energy Forecasting Using Real-Time NYISO Data. Energies. 2026; 19(2):357. https://doi.org/10.3390/en19020357
Chicago/Turabian StyleSayghe, Ali, Mohammed Ahmed Mousa, Salem Batiyah, Abdulrahman Husawi, and Mansour Almuwallad. 2026. "GridFM: A Physics-Informed Foundation Model for Multi-Task Energy Forecasting Using Real-Time NYISO Data" Energies 19, no. 2: 357. https://doi.org/10.3390/en19020357
APA StyleSayghe, A., Mousa, M. A., Batiyah, S., Husawi, A., & Almuwallad, M. (2026). GridFM: A Physics-Informed Foundation Model for Multi-Task Energy Forecasting Using Real-Time NYISO Data. Energies, 19(2), 357. https://doi.org/10.3390/en19020357

