Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market

Sinclair, Mark; Shepley, Andrew J.; Hajati, Farshid

doi:10.3390/app16010075

Open AccessArticle

Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market

by

Mark Sinclair

^*

,

Andrew J. Shepley

and

Farshid Hajati

School of Science and Technology, University of New England, Armidale, NSW 2350, Australia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 75; https://doi.org/10.3390/app16010075

Submission received: 30 November 2025 / Revised: 14 December 2025 / Accepted: 18 December 2025 / Published: 21 December 2025

(This article belongs to the Special Issue Artificial Intelligence (AI) for Energy Systems)

Download

Browse Figures

Versions Notes

Featured Application

Application and comparative performance analysis of state-of-the-art transformer architectures for improved prediction of Australian National Energy Market spot prices.

Abstract

The increasing adoption of highly variable renewable energy has introduced unprecedented volatility into the National Electricity Market (NEM), rendering traditional linear price forecasting models insufficient. The Australian Energy Market Operator (AEMO) spot price forecasts often struggle during periods of volatile demand, renewable variability, and strategic rebidding. This study evaluates whether transformer architectures can improve intraday NEM price forecasting. Using 34 months of market data and weather conditions, several transformer variants, including encoder–decoder, decoder-only, and encoder-only, were compared against the AEMO’s operational forecast, a two-layer LSTM baseline, the Temporal Fusion Transformer, PatchTST, and TimesFM. The decoder-only transformer achieved the best accuracy across the 2–16 h horizons in NSW, with nMAPE values of 33.6–39.2%, outperforming both AEMO and all baseline models. Retraining in Victoria and Queensland produced similarly strong results, demonstrating robust regional generalisation. A feature importance analysis showed that future-facing predispatch and forecast covariates dominate model importance, explaining why a decoder-only transformer variant performed so competitively. While magnitude estimation for extreme price spikes remains challenging, the transformer models demonstrated superior capability in delivering statistically significant improvements in forecast accuracy. An API providing real-time forecasts using the small encoder–decoder transformer model is available.

Keywords:

electricity price forecasting; National Electricity Market (NEM); transformer models; deep learning; intraday forecasting; energy market modelling; market volatility; temporal feature engineering; machine learning applications in energy; Australian energy markets

1. Introduction

As the world embraces renewable energy and associated energy storage systems, the ability to accurately forecast electricity prices is becoming increasingly important. Investment in renewable energy projects relies on financial returns, which are enhanced by predictable energy prices. This paper aims to explore the use of deep learning transformers to improve the accuracy of electricity forecasting in the Australian National Electricity Market (NEM) to affect the more efficient use of energy and ultimately lower electricity costs.

In the case of electricity in Australia, spot prices are determined by two energy markets—the National Electricity Market (NEM) and the Wholesale Electricity Market (WEM). The NEM serves the eastern states and the majority of the Australian population, while the WEM exclusively serves the state of Western Australia (WA). The NEM is managed by the Australian Energy Market Operator (AEMO).

The NEM Regional Reference Price (RRP) forecasts are currently created using a linear programming solver that takes electricity generator price bids and forecast demand and solves for the minimum cost solution per five minute period [1]. A linear programming solver aims to solve a set of linear equations in the following mathematical form:

Minimise Z = c^{T} x = \sum_{j = 1}^{n} c_{j} x_{j}

(1)

subject to \sum_{j = 1}^{n} a_{i j} x_{j} \leq b_{i}, i = 1, 2, \dots, m, x_{j} \geq 0, j = 1, 2, \dots, n

(2)

In this formulation, Z represents the total dispatch cost, c is the vector of generator bid prices, and x is the vector of dispatched quantities. The associated coefficients c_j and x_j are the j-th scalar quantities. The coefficients a_ij capture system constraints, including forecast demand, generator ramp-rate limits, and interconnector flow limits, and b_i denotes the allowable bound for each constraint.

Typically, generators will initially bid high and then lower their bids as they approach the five minute execution period in order to secure the right to deliver their energy for the period [2]. Consequently, NEM price forecasts often start high and usually decrease as the delivery time approaches.

Forecasts of NEM spot prices are widely utilised by a diverse range of stakeholders. Generation companies, including coal, gas, hydro, solar, and wind operators, draw upon such forecasts to determine optimal dispatch, storage, and curtailment strategies [3]. Battery Energy Storage System (BESS) operators employ forecast information to inform charging, discharging, and energy-holding decisions, thereby maximising revenue while maintaining system reliability. Retail electricity providers use price forecasts to develop hedging strategies aimed at mitigating exposure to wholesale price volatility. Similarly, Virtual Power Plant (VPP) operators incorporate forecasts into scheduling algorithms that determine when distributed resources should buy from or sell to the grid [3,4]. Large commercial and industrial consumers may rely on forecasts to schedule flexible demand or to manage financial risk through hedging instruments. In parallel, speculative traders in the Australian Securities Exchange (ASX) energy futures market employ forecast data to identify opportunities for arbitrage and portfolio optimisation [5].

The consequences of forecast inaccuracy are therefore far-reaching. Poorly calibrated or unreliable spot price forecasts can negatively impact generation scheduling, storage dispatch, retail risk management, and demand-side planning. These inefficiencies ultimately propagate across the market, raising transaction costs, reducing allocative efficiency, and contributing to higher electricity prices for all consumers. Consequently, improving forecast accuracy remains a critical research and policy priority for electricity markets undergoing rapid transitions in generation mix, demand flexibility, and storage integration.

More broadly, improved price forecasts will lead to more efficient use of electricity. This, in turn, will put downward pressure on electricity prices and benefit the entire economy. Lower electricity prices will provide social benefits to low-income families that may be struggling with cost of living pressures. Energy efficiency is critical for the environment because energy production has a negative environmental impact, regardless of whether or not it is renewable [6].

2. Related Works

Approaches to price forecasting include both traditional statistical methods, such as Auto-Regressive Integrated Moving Average (ARIMA) [7] or multiple linear regression [8], and conventional modern machine learning techniques. The latter include gradient boosting machines [9] or deep learning techniques such as Long-Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) [10,11,12]. Recent work has begun to explore transformer-based approaches for financial and energy-related forecasting, although these studies remain limited in scope. For example, Hartanto and Gunawan [13] applied the Temporal Fusion Transformer to multivariate stock price prediction and demonstrated that the model could capture complex temporal patterns more effectively than recurrent neural network baselines. Their study emphasised the benefits of combining transformer attention mechanisms with LSTM-style sequence modelling, aligning with much of the broader literature in which hybrid Convolution Neural Network (CNN)–LSTM or CNN–BiLSTM architectures continue to dominate forecasting research.

Many recent approaches to electricity price prediction rely on LSTMs or use hybrid rather than pure transformer methods, which makes it difficult to gain insight into the potential effectiveness of end-to-end transformer applications. For example, Lago et al. performed a review of state-of-the-art algorithms used to predict day-ahead electricity prices for markets where generator bids are submitted for a full day of delivery [14]. They found the Deep Neural Network, a multilayer perceptron with 2 hidden layers, to slightly outperform LSTMs and Gated Recurrent Units (GRU) in terms of price predictions by deep learning models. However, their study did not consider transformer-based deep learning models. Similarly, in 2022, Bottieau et al. studied a novel LSTM/transformer hybrid approach for predicting real-time pricing of the Belgian electricity market [15]. Their model featured a set of bidirectional LSTMs and was combined with a single self-attention mechanism. Bidirectional LSTMs were utilised to provide the temporal processing layer, and the self-attention mechanism then provided the attention focus. Their approach, which compared their hybrid model to traditional LSTMs, ARMAX, and gradient boosted trees, was effective in that their model outperformed the selected “state-of-the-art forecasting” methods. However, it did not research how a traditional, full encoder–decoder transformer performed. Abdellatif et al. investigated hybrid deep learning architectures for day-ahead price forecasting in the Nord Pool electricity market, combining a Bidirectional LSTM (BiLSTM) with Convolutional Neural Network (CNN) layers to leverage both local pattern extraction and long-range temporal dependencies [16]. To address scale sensitivity in neural network forecasting models, particularly the difficulty neural components have when predictor variables operate on very different numerical ranges, they incorporated a linear autoregressive bypass that learns a direct mapping from recent prices to future values. This hybrid architecture, termed CNN-BiLSTM-AR, was evaluated against several alternatives including CNN-LSTM, CNN-BiLSTM, and CNN-LSTM-AR. Across multiple error metrics, their results showed that the CNN-BiLSTM-AR model consistently achieved the lowest forecasting errors, demonstrating the benefit of combining convolutional feature extraction, bidirectional sequence modelling, and an autoregressive linear path within a single framework. Xu et al. used Variational Mode Decomposition (VMD) to reduce the impact of noise before passing the sequence on to a Temporal Convolution Network (TCN) and Gated Recurrent Network (GRU) for pattern recognition and finally a multi-head attention mechanism to select the most valuable information [17].

Some studies used a transformer-based encoder but neglected to use the decoder with future covariates. For example, Cantillo-Luna et al. researched a probabilistic model based on transformer encoder blocks with Time2Vec [18] layers to predict electricity prices eight hours ahead in the Colombian market [10]. They compared their model to models such as Holt–Winters, XGBoost, Stacked LSTM, and an Attention-LSTM hybrid, and successfully demonstrated that their model outperforms these traditional approaches. However, they did not study the performance of a full encoder–decoder model, and they only assessed 8 h ahead predictions.

Of particular relevance to this study is Tan et al. and Ghimire et al. which both addressed the Australian NEM. Tan et al. performed a study by applying a CNN and a sparse encoder to the NSW1 NEM region [19]. They employed an ensemble approach by incorporating a decomposition function into the complex time series to mitigate noise and volatility. They successfully showed that such an ensemble was an effective approach to NEM spot price predictions. Ghimire et al. published a study in 2024 that aimed to predict 30 min electricity prices in the NSW1 NEM region, utilising a hybrid MoDWT (Maximum Overlap Discrete Wavelet Transform) decomposition technique with a Bayesian-optimised CNN [20]. They found that, when compared to traditional models such as Bi-LSTM, LSTM, random forest, extreme gradient boosting, and multi-layer perceptron, the hybrid CNN model produced superior results. Another study focused on predicting extreme price events by integrating fine-tuned Large Language Models (LLM) with wavelet enhanced CNN LSTM architectures, demonstrating that structured information extracted from AEMO market notices can meaningfully improve forecasts during volatile periods in the NEM [21]. While these studies examined the application of deep learning techniques to NEM spot prices, they did not explore encoder–decoder transformer models, nor did they benchmark against the NEM operational forecast or show generalisation to other NEM regions.

Several recent papers have examined related problems that further highlight the emerging role of transformer architectures in electricity market price forecasting. One study developed a transformer-based price forecasting system with SHAP interpretability, showing that attention mechanisms combined with feature attribution can reveal the dominant drivers of electricity price variation while achieving strong predictive accuracy [22]. Work by Malyala et al. introduced a hybrid Graph Patch Informer and Deep State Sequential Memory approach for state-wide energy markets, showing that reliable forecasts can be achieved using only price and demand without weather inputs, which often add noise at broader geographic scales [23]. Gong et al. used a novel MCPformer [24], which employed a sequence decomposition, feeding into an encoder–decoder hybrid transformer, in order to forecast electricity market prices provided by the China Electric Power Research Institute. They compared their hybrid model to that of the traditional transformer [25], Informer [26], Autoformer [27], and Fedformer [28]. A similar study by Han and Wang [29] introduced a TransGraph-Opt model that used transformers to extract temporal features from past and future covariates, a Graph Neural Network (GNN) and PCGrad optimisation [30] to predict electricity prices of the PJM Interconnection. An application in the Spanish electricity market also used a GNN, but this time it was focused on improving the renewal energy generation forecast and then using that forecast to help more accurately predict market prices [31]. Beyond forecasting, transformer-based temporal feature extraction has also been applied to operational decision making, such as a temporal aware reinforcement learning framework that enables battery energy storage systems to optimise joint participation in the spot and contingency Frequency Control Ancillary Services (FCAS) markets [32], demonstrating substantial revenue gains through improved modelling of temporal price patterns. These studies, using hybrid transformer models, were developed and validated using unique, non-public, or single-region datasets. This makes it difficult to rigorously compare the effectiveness of different algorithms or ensure they generalise well to the unique dynamics, volatility patterns, and regulatory frameworks of other markets.

While recent studies have begun exploring transformer architectures for related forecasting tasks, such as Ghimire et al. who applied a transformer to load forecasting in Queensland (QLD) in 2023 [33], and hybrid convolutional network approaches used for 30 min price prediction in New South Wales (NSW) in 2024, the broader literature on electricity price forecasting remains dominated by convolutional networks, recurrent networks, and ensemble methods. No existing studies evaluate the performance of pure transformer models, in particular the traditional encoder–decoder architecture, for electricity price forecasting in Australia or internationally. This represents an important methodological gap because wholesale electricity prices arise from the interaction of both demand and supply side bidding under rapidly changing market conditions, an environment that aligns naturally with the attention mechanisms and sequence modelling capabilities of transformers. The present study addresses this gap by evaluating state-of-the-art transformer architectures for the task of predicting volatile NEM spot prices, providing the first systematic assessment of transformer-only models for this forecasting domain.

3. Materials and Methods

3.1. Data

The NEM regions that were chosen, NSW1, QLD1, and VIC1 (as illustrated in Figure 1), all exhibit materially different price behaviours, levels of renewable integration, market structures, and volatility profiles. For example, QLD1 is characterised by frequent high-price excursions linked to thermal generator bidding strategies, whereas VIC1 displays stronger wholesale coupling to wind-generation variability and interconnector constraints. NSW1, as the largest demand centre, presents comparatively smoother but structurally complex price dynamics.

All data used in this study was obtained from publicly available sources, ensuring full reproducibility. Electricity market observations and forecast data were sourced from the Australian Energy Market Operator (AEMO) [34,35], while meteorological observations and forecasts were obtained from Open-Meteo [36]. While the evaluation focused on the NSW1, QLD1, and VIC1 NEM regions, the data collection and pipeline included all interconnected NEM regions, such as SA1 and TAS1. This comprehensive approach was taken because these regions are integral parts of the interconnected energy market [37].

Figure 1. Map of Australia showing the interconnected NEM regions. The main focus of this study will be on NSW1 which represents the largest state by population in Australia [38]. Image adapted from “Australia Color Map” by Quickiebytes, Syed, Wikimedia Commons (accessed on 29 Nov 2025), licensed under CC BY-SA 3.0 [39].

The data collected covered thirty-minute periods from November 2022 through to August 2025 and featured 49,447 observations in total. The consolidated multivariate dataset created for this research has been deposited as an open-access resource (https://www.kaggle.com/datasets/markwsinclair/nempricesweather2022to2025, accessed on 29 November 2025). All preprocessing scripts, model configurations, and training code are also publicly available (https://github.com/redaxe101/TransformerApplicationNEM, accessed on 29 November 2025). A schematic overview of the data pipeline is provided in Figure 2.

3.2. Preprocessing

Thirty-four months of data was collected, covering NEM operational information, including actual spot prices, operational demand and net interchange, as well as the full suite of NEM thirty-minute ahead pre-dispatch forecasts for price, demand, and net interchange. Weather variables, including temperature, humidity, wind speed, and cloud cover, were compiled for the capital city associated with each NEM region examined in this study. As no official benchmark dataset exists for NEM forecasting research, all data streams were manually merged into a consistent temporal frame.

All data was resampled or aggregated to a uniform thirty-minute resolution, ensuring strict timestamp alignment across the actual and forecast domains. Preprocessing involved parsing and flattening nested AEMO files, resolving daylight savings irregularities, removing anomalies, interpolating missing meteorological measurements as necessary, and producing a chronologically ordered dataset. Thirty-two step sequences were generated from this at 30 min intervals. Any sequence with missing data that did not contain a complete set of 32 steps was dropped. Data was monitored for exceptionally large erroneous numbers but none were detected. However, it was important to retain anomalous price spikes, caused by breakdowns and un-forecast weather extremes, since they are frequent features of the NEM. QuantileTransformer scalers were used to reduce the impact of such extremes on training and subsequent performance of the various models. The QuantileTransformer maps the empirical distribution of electricity prices to a smooth, approximately Gaussian space by transforming each value according to its quantile. This is especially useful for markets with rare extreme spikes, as it compresses heavy tails and stabilises model training while preserving the relative ordering of high-impact events. No temporal shuffling of sequences occurred prior to dataset splitting.

Feature engineering aimed to provide the models with variables known or hypothesised to influence NEM price formation [19,40]. These included time-based encodings capturing diurnal, weekly, and seasonal cycles. Weather features that were expected to have the greatest impact on regional electricity generation and consumption were chosen. RRP, demand and net interchange features were included for all NEM regions since they are all interconnected and influence prices in each. The complete list of features used is shown in Table 1.

3.3. Model Architectures

The primary forecasting model used in this study was a transformer architecture based on the seminal work of Vaswani et al. [25], as illustrated in Figure 3. Transformers make use of a multi-head self-attention mechanism in which attention heads learn to emphasise the most relevant parts of the input sequence at each timestep. This mechanism enables efficient modelling of long-range temporal dependencies, making transformers particularly effective for electricity price forecasting where patterns can span multiple hours or even days.

Transformers typically consist of an encoder and a decoder linked by a cross-attention mechanism. The encoder processes the historical sequences, such as RRP, demand, and weather actuals, while the decoder consumes the known future inputs, including AEMO pre-dispatch forecasts and weather forecasts. Both components may be stacked in multiple layers to provide increased representational capacity and enable the model to capture hierarchical temporal patterns across different forecast horizons.

In this study, the encoder was responsible for learning latent representations of past market behaviour, whereas the decoder integrated these representations with exogenous forward-looking signals to generate a full 16 h ahead forecast. Positional encodings were applied to both streams to ensure that the model retained awareness of the temporal order of the inputs, a crucial requirement given the irregular and highly dynamic nature of NEM spot prices. The combination of multi-head self-attention, cross-attention, and deeply stacked layers allowed the model to capture nonlinear interactions between historical drivers, forecast inputs, and evolving system conditions more effectively than recurrent or convolutional baselines.

Since the original transformer was developed for sequence-to-sequence text translation, several modifications were made to adapt the architecture for numerical time-series forecasting and to improve training stability on NEM datasets. First, a pre-layer normalisation (pre-LN) formulation was adopted, following the stabilised architecture proposed by Wang et al. [41]. Pre-LN significantly improves gradient stability during training and removes the need for the large learning-rate warm-up schedule used in the original transformer. Consequently, the warm-up phase was omitted entirely, and the model was successfully trained using the standard Adam optimiser [42] with a fixed learning rate.

Second, the decoder was configured to use parallel decoding, enabling the model to generate the entire 32-step forecast horizon in a single forward pass. This approach avoids the accumulation of errors inherent in autoregressive decoding and aligns with operational forecasting needs in the NEM, where full multi-horizon price trajectories must be generated simultaneously. To assess the impact of decoding strategy, an additional autoregressive (AR) decoder variant was implemented and evaluated. The AR configuration predicted each future timestep sequentially, feeding earlier predictions back into the model, enabling direct comparison between parallel and AR decoding methods.

The default “small” encoder–decoder model consisted of three layers with four attention heads per layer, a hidden dimension of 128, a feed-forward dimension of 512, and a dropout rate of 0.05. It accepted an input sequence of ninety-six 30 min time steps and produced a forecast horizon of thirty-two 30 min time steps.

To evaluate robustness and generalisation, a variety of architectural variants were tested, including tiny, small, medium, and large transformer models, as well as encoder-only and decoder-only configurations. A summary of these variants and their hyperparameters is provided in Table 2 and shown in Figure 4.

A generic two-layer Long Short-Term Memory (LSTM), a Patch Time Series Transformer (PatchTST) [43], a TimesFM [44], and a Temporal Fusion Transformer (TFT) [45], were included for comparison. These models were chosen since they were modern state-of-the-art hybrids of each kind of transformer. TimesFM is a decoder-only model, PatchTST is encoder-only, and TFT contains both encoders and decoders. Each of the architectures is shown in Figure 5.

A simple two-layer LSTM provides a strong baseline for short-term RRP forecasting because it captures sequential dependencies in load, price, and weather while remaining computationally lightweight. The first LSTM layer learns short-term patterns, such as daily patterns, while the second layer captures longer term patterns including weekly and seasonal changes. LSTMs are the currently established way of solving these types of time-series problems [11,12].

The TFT architecture combines LSTM encoders with multi-head attention and explicit variable-selection layers, making it well-suited to RRP forecasting, where both past conditions and future covariates, such as predispatch forecasts, weather projections, and outages, influence price formation. Its LSTM layers learn local patterns, and its attention mechanisms help identify which drivers matter at longer horizons, giving good interpretability and often strong performance when diverse feature sets are available. The traditional quantile output head has been replaced with a single dense output head since we are making point-price predictions.

TimesFM applies a pretrained large-scale foundation model with self-attention encoders and a dedicated forecast head, allowing it to extract generalisable temporal patterns from NEM data. Its ability to transfer patterns learned from massive global time-series corpora makes it effective for medium-horizon RRP prediction, where structural noise, demand cycles, and renewable variability dominate. It is used in zero-shot mode, without any fine-tuning, where it can generate accurate multi-step forecasts by conditioning only on the input sequence and forecast horizon, making it highly adaptable to unseen time series with minimal task-specific configuration. However, it has the potential for improvement with fine-tuning.

PatchTST excels in RRP forecasting by processing inputs as overlapping temporal patches, allowing the model to focus on localised variations, such as ramp events, solar troughs, wind lulls, and rebidding episodes, while efficiently capturing long-range structure through attention. Its channel-independent design, in which each feature is viewed in its own channel, cleanly handles multivariate NEM inputs and often improves robustness, particularly when the target series exhibits nonlinear seasonality or regime changes.

3.4. Training Procedure and Hyperparameter Tuning

The 2-layer LSTM and TFT models were implemented using Keras 3.12 and Tensorflow 2.16. PatchTST was implemented with Pytorch 2.9 and TimesFM inference was performed with the pre-trained Pytorch model [46]. All models were trained using an 80:20 train–test split, ensuring that the hold-out test set comprised exclusively unseen future data. The training dataset consisted of 37,085 observations with a validation set of 2440 observations. The hold-out test dataset was 9858 observations. Temporal integrity was strictly preserved by avoiding any form of shuffling before the split and a padded zone was added between sets to ensure no data leakage between any of the sets. After partitioning, the training set was shuffled at the sequence level to reduce the risk of memorisation. Experiments were executed using NVIDIA A100 GPU-accelerated compute resources. Random seeds were fixed for all runs to guarantee deterministic behaviour. Prior to training, inputs were scaled using a quantile transformer (normal output distribution, 2000 quantiles) to handle the heavy-tailed and non-Gaussian characteristics of NEM price data. PatchTST was the exception where it was provided with unscaled data since it featured an internal Reversible Instance Normalization (RevIN) function.

Hyperparameters, including model depth, attention heads, dropout, feed-forward dimensionality, input sequence length, and optimiser configuration, were tuned using a five-fold walk-forward validation procedure. This method allowed the models to be trained and validated on sequentially advancing windows that mimicked real operational forecasting constraints. The final hyperparameter selections for each model are summarised in Table 3.

3.5. Evaluation Framework

Model performance was evaluated at discrete forecast horizons ranging from 2 to 16 h ahead. The primary metrics were Normalised Mean Absolute Percentage Error (nMAPE) and Mean Absolute Error (MAE):

n M A P E = \frac{100}{n} \sum_{i = 1}^{n} \frac{|y_{i} - \hat{y_{i}}|}{\bar{y}}

(3)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(4)

In these expressions, y_i denotes the actual observed value at time i, and

\hat{y}

represents the corresponding model prediction. The term n represents the total number of samples in the evaluation set. For the nMAPE metric,

\bar{y}

denotes the mean of the observed values, used for normalisation. In addition, forecasts were benchmarked directly against AEMO’s published pre-dispatch values to assess operational improvement potential. Initial experiments focused on the NSW1 region, after which the models were retrained and applied to the VIC1 and QLD1 regions using identical feature sets except for region-specific weather inputs.

4. Results

In this section, we provide performance results for all transformer architectures for the NSW1, VIC1, and QLD1 datasets. In Table 4, we compare the performance of each transformer architecture against the NEM operational forecast and the comparison models across 2–16 h horizons on the NSW1 dataset. We present SHAP analysis results for the transformer models to understand which features contribute to performance. We then provide results for two other NEM regions, QLD1 and VIC1, to show the generalisability of the transformer models. More detailed result data, including an nMAPE breakdown between spike and non-spikes and MAE results, is included in Appendix A.

4.1. NSW1 Region Results

For the NSW1 NEM region, the decoder-only transformer architecture consistently outperformed all other models, demonstrating significant improvements in forecast accuracy. As shown in Table 4 and Figure 6, the decoder-only transformer achieved the strongest short-range performance, with nMAPE values of 33.6% to 39.2% across all horizons. Compared to the official NEM predispatch forecast, which reported nMAPE values between 65.6% and 116.0%, this represents a 46–67% reduction in forecast error, a magnitude of improvement that is both substantial and statistically significant in the context of electricity price forecasting as confirmed by a Diebold–Mariano test.

The Temporal Fusion Transformer (TFT) was the next-best-performing model overall, achieving nMAPE values of 36.3% to 40.8% across the forecast horizons. The LSTM baseline displayed competitive short-term performance (e.g., 38.2% at 2 h), but degradation accelerated with the forecast horizon, ending at 44.4% at 16 h. While the LSTM achieved 30–60% lower error than the NEM baseline, it remained 5–15% worse than the leading transformer models across most horizons. Such differences are large enough to be considered practically meaningful in operational settings where small improvements translate directly into improved dispatch and arbitrage outcomes. From a consistency across forecast horizons perspective, Figure 7 shows that the encoder–decoder and the autoregressive transformer variants achieved consistently low MAE values, while TimesFM in a zero-shot mode exhibited the highest higher error dispersion of the non-NEM forecasts. The NEM forecasts had the greatest range of MAE values showing very inconsistent accuracy.

4.2. SHAP Results

The SHAP-based feature importance analysis [47], performed for the small enc-dec transformer model on the NSW region and shown in Figure 8, shows a clear and consistent pattern: future-facing decoder inputs dominate predictive power, while historical encoder features contribute relatively little to the model’s accuracy. In the NSW1 analysis, the most influential features were overwhelmingly the AEMO predispatch forecasts, led by the NSW1 NEM RRP forecast, which contributed more than 60% of the total importance. Forecasted regional demand, interconnector flows, and multi-region forecast RRPs also ranked highly, confirming that the transformer relies primarily on market expectations and system outlooks rather than long-range historical patterns.

In contrast, the encoder-side historical features all exhibited very small contributions, each accounting for only a few percent of total importance. These included past RRPs, historical regional demands, weather observations, and cyclical time encodings. Their uniformly low impact is consistent with the model architecture and the structure of the NEM. Once future-facing covariates such as AEMO predispatch forecasts are available, historical inputs add little incremental value for multi-hour-ahead price prediction.

In addition to overall feature importance, the SHAP dependence plots provide insight into how forecast demand interacts with other inputs rather than acting on its own. As shown in Figure 9, demand has a strong and consistent influence on the model’s output across its full range, but the strength of this influence varies depending on surrounding conditions. Calendar effects introduce clear regime differences, with similar demand levels contributing differently on working days compared to non-working days. Weather variables such as temperature and humidity further shape how demand is used by the model, suggesting that demand sensitivity depends on broader environmental conditions rather than following a single, fixed relationship. Time-of-day effects, captured through the half-hour cyclical feature, add another layer of structure, reflecting well-known diurnal patterns in electricity consumption and generation. Together, these interactions highlight mild nonlinear behaviour and help explain how the model adapts its use of demand information across different operating contexts.

4.3. Region Generalisation Results

Across all three NEM regions, the transformer-based models exhibited highly consistent forecasting behaviour, with error profiles that closely mirrored those obtained in the NSW1 experiments as shown in Table 5 and Table 6, Figure 10 and Figure 11. In both QLD1, the decoder-only and encoder–decoder transformers again produced the strongest overall results, maintaining nMAPE values in the low 30 percent range across the 2–16 h horizons. This stability across markets with markedly different generation mixes, levels of renewable variability, and interconnector dynamics indicates that the learned representations generalise effectively beyond the original NSW1 training environment. The only notable exception occurred in VIC1 at the two-hour horizon, where the official NEM forecast marginally outperformed all machine learning models. At all other horizons in VIC1, and at every horizon in NSW1 and QLD1, the transformer variants remained superior to the NEM benchmark. Models such as the TFT performed well, especially in VIC1, but was less consistent in QLD1, while historical-only approaches, including PatchTST and TimesFM, performed substantially worse. Overall, the cross-regional consistency of the transformer models suggests a strong capacity to capture structural patterns in NEM price formation and indicates clear potential for generalisable, real-time forecasting applications.

4.4. Transformer Size Variant Results

Across the model-size comparison, the small encoder–decoder transformer achieved the lowest nMAPE values for NSW1 for the 2 h and 4 h horizons, outperforming both smaller and larger variants across all horizons as shown in Table 7. It was second to the large model for the other horizons but the difference was less than 1%. These results indicate that the small configuration provides the best balance between model capacity and dataset size. The medium model offered no increase in accuracy despite having higher internal dimensions, whereas the large model, with an extra layer and twice as many attention heads, provided better performance at longer forecast horizons.

5. Discussion

5.1. NSW1 Region Performance

The encoder–decoder and decoder-only transformers performed particularly well on this dataset because their architecture naturally accommodates multi-source, heterogeneous inputs, allowing the model to integrate AEMO predispatch forecasts, weather outlooks, and other forward-looking signals in a structured way. Unlike other deep learning models designed to take historical data only such as TimesFM and PatchTST, the encoder–decoder and decoder-only transformers handle future covariates and multi-step decoder inputs natively, enabling them to form richer representations of expected system conditions that extend well beyond the observed past. This aligns with the SHAP analysis, which indicates that the decoder-side features and forecast covariates dominate predictive importance, meaning the model gains most of its explanatory power from understanding future market expectations, not from long-range attention over historical sequences. The encoder–decoder transformers are particularly effective in this setting because they can represent these multi-horizon forecasts jointly while maintaining flexibility across different input types and time scales.

TFT’s strong performance largely reflects its ability to dynamically select and weight exogenous covariates, which suits the NEM where prices depend heavily on volatile inputs such as predispatch forecasts, weather features, and rooftop PV expectations. Its hybrid design, combining LSTMs and gating for short-term structure with attention for longer-range dependencies, also helps it capture the mix of intraday and multi-hour patterns characteristic of electricity markets. Although it does not quite match the top-performing transformer variants, the difference is small (typically 1–3 percentage points), indicating that TFT remains a highly competitive model for operational price forecasting in the NEM while noting that its high expressiveness can make it more sensitive to isolated extreme spikes.

The two-layer LSTM model performed moderately well but did not match the performance of the transformer variants. The multi-head self-attention mechanisms of the transformers enabled them to model long-term dependencies under high volatility more effectively than the LSTM. The LSTM’s weakness is its inability to retain context over longer periods [48], and this was evident in the results, where it was most competitive at shorter prediction horizons but fell away as horizons moved further out.

Models relying solely on historical data, such as PatchTST and TimesFM, performed substantially worse. PatchTST achieved nMAPE values between 57.2% and 64.3%, while TimesFM exceeded 100% error at several horizons. These models performed 20–50% worse than the models using exogenous known forecast information, confirming that covariates such as AEMO predispatch forecasts, engineered temporal features, and weather forecasts are statistically essential for accurate electricity price prediction in the NEM. Their poor performance highlights the limitations of historical-only models in markets dominated by structural regime shifts, intervention events, and variable renewable output. This was confirmed by the SHAP study results which support the finding that models that can use multivariate known forecasts are inherently better suited to NEM forecasting, as they directly ingest the information sources that contain the bulk of the explanatory power. PatchTST excels at long-horizon encoder-only forecasts, but in this case, it underperformed without access to the future-facing covariates. TimesFM was at the greatest disadvantage in that it not only used historical data but was also given only RRP to consider, and it was evaluated in its pre-trained state, without any fine tuning on the NEM training set. While it is known for its strength in forecasting with unseen data, it is no match for models trained on the future known covariates.

The statistical significance of these results is demonstrated by the Diebold–Mariano tests presented in Table 8, where the improvement relative to the NEM forecast was evaluated for each model. These enhancements represent not only statistically significant improvements over the NEM benchmark at the 95 percent confidence level (p-value < 0.05) for all models but are also operationally relevant, considering the direct financial implications of forecast accuracy for battery dispatch, risk management, and market bidding strategies.

The improvement from an 82.7% nMAPE (as represented by the AEMO NEM price forecasts in NSW1 at the 4 h horizon) to approximately 37% using the transformer-based model has significant implications for commercial battery operators. Profitability in battery arbitrage relies on accurately anticipating price differentials many hours in advance, since charge–discharge commitments, optimisation schedules, and market bids must be prepared well before real-time dispatch. An 82.7% forecast error forces operators to adopt highly conservative bidding strategies, often leading to missed revenue opportunities, suboptimal cycling, and increased exposure to unexpected price spikes. Reducing this error to around 37% fundamentally changes the operating landscape: the battery can commit to higher-confidence discharge events, avoid unnecessary charging during mispredicted low-price periods, and better position state-of-charge for periods of volatility. This improvement translates directly into increased arbitrage margins, more reliable charge and discharge decisions, and lower wear-related costs because the battery can operate with far greater foresight rather than reacting to conditions as they occur. In practice, such a gain in predictive accuracy can yield materially higher annual returns, particularly in regions characterised by renewable-driven ramps and frequent intraday volatility and positions the operator to compete more effectively against generators and traders who rely on less accurate public forecasts.

5.2. Generalisation to Other NEM Regions

Across all three NEM regions examined, the decoder-only transformer-based architecture displayed a striking level of consistency, suggesting a high degree of model generalisability despite the markedly different supply–demand profiles that characterise NSW1, QLD1 and VIC1. The strong results achieved in NSW1 were essentially replicated when the models were retrained on QLD1 and VIC1 targets. In QLD1, where price formation is heavily influenced by rooftop solar variability [33,49], rapid afternoon ramps and frequent strategic bidding, both the decoder-only and encoder–decoder transformers achieved error levels that closely matched those observed in NSW1. Similarly robust performance was obtained in VIC1, a region shaped by brown-coal baseload, interconnector congestion and different renewable generation patterns [49]. In each of these regions, the small decoder-only transformer consistently achieved normalised MAPE values in the upper thirties to very low forties across the 2–16 h forecast horizons.

The fact that these models produced nearly identical accuracy distributions across three regions with such distinct load shapes, dispatch dynamics and renewable penetrations provides compelling evidence that the transformers did not simply overfit the NSW1 environment but instead learned structural relationships that transfer effectively across the broader NEM. This is further reinforced by the stability of model rankings across regions: the transformer variants remained the most accurate overall models in NSW1 and QLD1 and were either competitive with or superior to all other machine learning baselines in VIC1, with the main exception being the NEM’s own operational forecast at the 2 h horizon, where it achieved the lowest error for that specific case.

The comparative models exhibited similar patterns across the three regions. The TFT and LSTM architectures maintained very good accuracy, though typically with slightly higher error ranges than the transformers except where the TFT dominated in VIC1. By contrast, historical-only models that lacked access to forward-looking covariates, including the encoder-only variant, PatchTST and TimesFM, consistently underperformed with greater errors in every region. It must be noted that TimesFM was used in a zero-shot capacity and may have performed better if it had been fine-tuned on the training dataset. AEMO’s official NEM forecasts also displayed considerably higher error levels at most horizons, particularly beyond short lead times, confirming that the market’s linear-programming-based forecasting engine struggles to match the representational capacity of the transformer architectures for intraday price prediction.

Taken together, the results across NSW1, QLD1 and VIC1 demonstrate that transformer-based models offer a reliable and transferable forecasting framework for the NEM. Their ability to reproduce strong performance across diverse regional conditions indicates that the learned temporal and cross-feature dependencies capture fundamental aspects of price formation that are stable across the eastern Australian grid. This suggests a high degree of practical generalisability and supports their potential for real-time deployment in operational forecasting, battery optimisation and market-facing analytics.

5.3. Transformer Variants

The size variant results in Table 7 show that the small encoder–decoder transformer provides the best overall performance for the NSW1 region, achieving nMAPE values of 34–40% across the five evaluated forecast horizons. Although the tiny and medium variants produced competitive results, their accuracy was consistently inferior to the small model by a margin of 1–2 percentage points at the shorter horizons where precision is most critical. The large model, with extra attention heads and layers, demonstrated superior performance at longer horizons but the small model stayed within 1% of it.

The relative stability of the small model across horizons indicates that it achieves the best balance between capacity and data availability for the NSW1 training set. NEM price data, although rich in covariates, remains highly volatile and structurally complex, with only limited signal available to support very deep or wide architectures. Larger models may be prone to learning noise or over-representing rare price regimes, while smaller models may lack sufficient representational power to encode multi-hour interactions between demand, renewables, and market expectations. The small model appears to sit at the optimal point on this trade-off curve, offering enough depth and width to model nonlinear interactions and multi-horizon dependencies without exceeding the dataset’s effective information content. Consequently, it represents the most suitable architecture for operational use in short- and medium-term price forecasting for the NSW1 region.

5.4. Assumptions and Limitations

Across all models evaluated, including the transformer variants, the LSTM, the Temporal Fusion Transformer, PatchTST, and TimesFM, a consistent pattern emerged in the handling of extreme price events. The models generally identified the timing of price spikes but tended to underestimate their magnitude. This behaviour is expected in the National Electricity Market, where extreme price events are rare, heavy tailed, and contribute very little to the overall optimisation objective during training. As a result, models that minimise mean absolute error learn to prioritise accuracy on the far more frequent low and moderate price levels and consequently regress toward central values when encountering rare extremes as shown in Figure 12. In contrast, the AEMO predispatch forecast, which is driven by operational risk considerations, often over-predicts the height of price spikes. The transformer models therefore reflect the empirical structure of the training data rather than the cautious posture of an operational planning tool. This behaviour is characteristic of deep learning models trained on highly imbalanced and volatility dominated time series. Accurate price spike prediction remains an outstanding challenge, and there is often a trade-off between accurately forecasting regular daily price levels and anomalies.

Another limitation of the models presented, especially those that performed well when using future covariates, was their reliance on the availability and accuracy of such forecast data. Any errors in forecasts are likely to have a profound impact on the accuracy of predictions. Additionally, results would be expected to be quite different in energy markets where pricing and demand forecasts are not provided by the market operator. In such a case, models such as PatchTST and TimesFM are likely to be significantly more competitive.

There is also the fundamental limitation of AI generally, in that it is assumed the trained models will continue to perform well in inference mode, despite the reality that the energy context is going through important structural changes, e.g., coal retirement, increasing renewable penetration, but the models will require periodic re-training to stay in touch with such evolutions of each energy region. Subsequently, all these models, except for TimesFM, required a significant amount of preprocessed data and computation to be trained when compared to more traditional methods of forecasting. TimesFM was used in its foundational one-shot mode and was not fine-tuned for this study and it, too, may have benefited from additional training.

Although the findings of this study have implications for other markets/potential for generalisability/transferability, this was not evaluated, as it would detract from the technical depth of consideration of an Australian NEM-only study.

6. Conclusions

This study investigated the effectiveness of various transformer-based models for forecasting spot electricity prices in the Australian National Electricity Market. Using 34 months of AEMO market data, weather observations, weather forecasts, and engineered features, a set of transformer architectures was trained and evaluated under a walk-forward validation framework to preserve temporal integrity and avoid look-ahead bias. A baseline two-layer LSTM was implemented for comparison.

Across all forecast horizons in the NSW1 region, the parallel decoder-only and encoder–decoder transformer architectures consistently outperformed the LSTM, TimesFM (zero-shot), PatchTST and the official AEMO predispatch forecasts. The small decoder-only transformer achieved the lowest nMAPE despite having no access to historical features. This, together with the feature-importance analysis, confirms that future-facing decoder covariates and self-attention over forecast trajectories provide the greatest predictive value for NEM price forecasting. The models achieved nMAPE values as low as 33–40%, representing substantial accuracy gains of 40–70% relative to AEMO’s operational forecasts.

Retraining the models on region-specific data for VIC1 and QLD1 yielded similarly strong results, demonstrating that transformer architectures generalise effectively across diverse NEM regions with different generation mixes and volatility profiles. While the models were able to detect the timing of extreme price spikes, they tended to underestimate their magnitudes, reflecting the rarity and heavy-tailed nature of these events.

Every model evaluated produced statistically significant improvements over AEMO’s operational NEM forecast, with absolute error reductions ranging from 20% to over 70%, depending on the model and horizon. This demonstrates that modern deep learning architectures, particularly transformer-based models enriched with exogenous covariates, can materially enhance the quality of electricity price forecasts in the Australian NEM, especially for horizons relevant to battery charging, hedging, and demand response strategies.

A public API has been developed to serve real-time forecasts using the small encoder–decoder transformer model. This service, available at https://nem.redaxe.com, accessed on 29 November 2025, provides an open and reproducible implementation of the forecasting system developed in this research.

Author Contributions

Conceptualization, M.S. and A.J.S.; methodology, M.S. and A.J.S.; software, M.S.; validation, M.S., A.J.S. and F.H.; formal analysis, M.S. and A.J.S.; investigation, M.S.; resources, M.S., A.J.S. and F.H.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, A.J.S. and F.H.; visualization, M.S.; supervision, A.J.S. and F.H.; project administration, A.J.S. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this study are publicly available. Historical electricity market data were obtained from the Australian Energy Market Operator (AEMO) NEM data archive (https://www.aemo.com.au, accessed on 29 November 2025), including spot prices, demand, and pre-dispatch forecasts. Historical and forecasted weather observations were retrieved from the Open-Meteo API (https://open-meteo.com/, accessed on 29 November 2025). All processed datasets are available in a public repository at: https://www.kaggle.com/datasets/markwsinclair/nempricesweather2022to2025, accessed on 29 November 2025.

Acknowledgments

During the preparation of this study, the authors used ChatGPT for the purposes of generating some of the code helper-functions used to perform the evaluation. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABS	Australian Bureau of Statistics
AEMO	Australian Energy Market Operator
AER	Australian Energy Regulator
API	Application Programming Interface
AR	Auto-Regressive
ARIMA	Auto-Regressive Integrated Moving Average
ARMAX	Auto-Regressive Moving Average eXogenous
ASX	Australian Stock Exchange
BESS	Battery Energy Storage System
CNN	Convolution Neural Network
FCAS	Frequency Control Ancillary Services
FFN	Feed-Forward Network
GNN	Graph Neural Network
GPT	Generative Pre-trained Transformer
LLM	Large Language Model
LSTM	Long Short Term Memory
MAE	Mean Absolute Error
NEM	National Electricity Market
nMAPE	Normalised Mean Absolute Percentage Error
MoDWT	Maximum Overlap Discrete Wavelet Transform
NSW	New South Wales (state)
PatchTST	Patch Time Series Transformer
QLD	Queensland (state)
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network
RRP	Regional Reference Price
SA	South Australia (state)
SHAP	SHapley Additive exPlanations
TAS	Tasmania (state)
TFT	Temporal Fusion Transformer
VIC	Victoria (state)
VMD	Variational Mode Decomposition
VPP	Virtual Power Plant
WEM	Wholesale Electricity Market

Appendix A

Appendix A.1. Full Results Table

Table A1. Full comparative nMAPE results for NSW1 split by spike (the 32-step sequence has at least one $5000+ RRP value) and non-spike performance.

	2 h			4 h			8 h			12 h			16 h
	Spike	Non.	All	Spike	Non.	All	Spike	Non.	All	Spike	Non.	All	Spike	Non.	All
Enc-Dec Transformer	82.4	26.2	37.9	83.3	25.8	37.9	80.8	27.0	38.2	84.8	27.9	40.1	85.0	28.7	40.5
Dec Transformer	70.8	23.9	33.6	69.9	24.7	34.2	78.9	25.6	36.8	82.2	26.5	38.4	82.2	27.8	39.2
Enc Transformer	87.9	30.8	42.7	87.4	32.5	44.0	84.7	34.5	45.0	83.1	35.4	45.6	83.1	37.0	46.6
Enc-AR-Dec Transformer	83.4	29.7	40.8	87.6	29.8	42.0	86.6	31.1	42.7	84.4	31.8	43.0	83.8	32.2	43.0
AR-Dec Transformer	86.6	26.1	38.7	82.8	26.7	38.5	81.3	28.5	39.5	80.8	28.2	39.5	80.1	29.6	40.2
LSTM	80.1	27.2	38.2	80.2	29.4	40.1	79.3	32.1	42.0	78.3	32.7	42.5	83.6	34.0	44.4
TFT	84.6	23.7	36.3	85.4	25.0	37.7	84.1	26.4	38.4	85.1	27.1	39.5	85.9	28.8	40.8
PatchTST	87.3	52.1	59.4	92.2	57.3	64.7	93.3	59.3	66.4	87.5	58.8	65.0	87.7	57.2	63.6
TimesFM	73.1	41.9	48.3	91.9	43.1	53.3	92.5	50.7	59.4	93.7	51.1	60.2	90.3	49.2	57.8
NEM	104.9	55.3	65.6	85.0	82.1	82.7	122.9	114.2	116.0	90.8	114.7	109.6	105.1	117.1	114.6

Appendix A.2. MAE Results for NSW1 Region

Table A2. NSW1 region comparative model MAE results showing the parallel decoder-only transformer as the top performer across all prediction points.

Model	2 h MAE	4 h MAE	8 h MAE	12 h MAE	16 h MAE
Enc-Dec Transformer	$47.48	$47.46	$47.81	$50.16	$50.69
Dec Transformer	$42.16	$42.87	$46.01	$48.07	$49.01
Enc Transformer	$53.47	$55.15	$56.35	$57.09	$58.37
Enc-AR-Dec Transformer	$51.17	$52.57	$53.47	$53.83	$53.77
AR-Dec Transformer	$48.46	$48.20	$49.43	$49.37	$50.28
LSTM	$47.83	$50.23	$52.50	$53.17	$55.59
TFT	$45.53	$47.27	$48.10	$49.50	$51.02
PatchTST	$74.50	$81.00	$83.07	$81.31	$79.62
TimesFM	$60.61	$66.84	$74.34	$75.37	$72.32
NEM	$82.25	$103.59	$145.20	$137.14	$143.47

Appendix A.3. Additional SHAPley Interaction Plots

Figure A1. Additional NSW1 SHAP dependence plots.

References

AEMO. Predispatch Procedure. Available online: https://aemo.com.au/-/media/files/electricity/nem/security_and_reliability/power_system_ops/procedures/so_op_3704-predispatch.pdf?la=en (accessed on 20 June 2025).
Clements, A.E.; Hurn, A.S.; Li, Z. Strategic bidding and rebidding in electricity markets. Energy Econ. 2016, 59, 24–36. [Google Scholar] [CrossRef]
Cheng, L.; Yu, F.; Huang, P.; Liu, G.; Zhang, M.; Sun, R. Game-theoretic evolution in renewable energy systems: Advancing sustainable energy management and decision optimization in decentralized power markets. Renew. Sustain. Energy Rev. 2025, 217, 115776. [Google Scholar] [CrossRef]
Kaiss, M.; Wan, Y.; Gebbran, D.; Vila, C.U.; Dragičević, T. Review on Virtual Power Plants/Virtual Aggregators: Concepts, applications, prospects and operation strategies. Renew. Sustain. Energy Rev. 2025, 211, 115242. [Google Scholar] [CrossRef]
ASX. ASX AU Electricity Futures Market. Available online: https://www.asxenergy.com.au/futures_au (accessed on 12 August 2025).
Al-Shetwi, A.Q. Sustainable development of renewable energy integrated power sector: Trends, environmental impacts, and recent challenges. Sci. Total Environ. 2022, 822, 153645. [Google Scholar] [CrossRef] [PubMed]
Jakaša, T.; Andročec, I.; Sprčić, P. Electricity price forecasting—ARIMA model approach. In Proceedings of the 2011 8th International Conference on the European Energy Market, Zagreb, Croatia, 25–27 May 2011; pp. 222–225. [Google Scholar] [CrossRef]
Saini, D.; Saxena, A.; Bansal, R.C. Electricity price forecasting by linear regression and SVM. In Proceedings of the 2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE), Jaipur, India, 23–25 December 2016; pp. 1–7. [Google Scholar] [CrossRef]
Bansal, M.; Raj, A.; Raj, A. Comparative Analysis of ML Models for Electricity Price Forecasting. In Inventive Communication and Computational Technologies; Lecture Notes in Networks and Systems; Springer Nature: Singapore, 2024; pp. 551–578. [Google Scholar] [CrossRef]
Cantillo-Luna, S.; Moreno-Chuquen, R.; Lopez-Sotelo, J.; Celeita, D. An Intra-Day Electricity Price Forecasting Based on a Probabilistic Transformer Neural Network Architecture. Energies 2023, 16, 6767. [Google Scholar] [CrossRef]
Zhou, S.; Zhou, L.; Mao, M.; Tai, H.-M.; Wan, Y. An optimized heterogeneous structure LSTM network for electricity price forecasting. IEEE Access 2019, 7, 108161–108173. [Google Scholar] [CrossRef]
Muzaffar, S.; Afshari, A. Short-term load forecasts using LSTM networks. Energy Procedia 2019, 158, 2922–2927. [Google Scholar] [CrossRef]
Hartanto, S.; Gunawan, A.A.S. Temporal Fusion Transformers for Enhanced Multivariate Time Series Forecasting of Indonesian Stock Prices. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2024, 15, 140–148. [Google Scholar] [CrossRef]
Lago, J.; Marcjasz, G.; De Schutter, B.; Weron, R. Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark. Appl. Energy 2021, 293, 116983. [Google Scholar] [CrossRef]
Bottieau, J.; Wang, Y.; De Greve, Z.; Vallee, F.; Toubeau, J.-F. Interpretable transformer model for capturing regime switching effects of real-time electricity prices. IEEE Trans. Power Syst. 2022, 38, 2162–2176. [Google Scholar] [CrossRef]
Abdellatif, A.; Mubarak, H.; Ahmad, S.; Mekhilef, S.; Abdellatef, H.; Mokhlis, H.; Kanesan, J. Electricity Price Forecasting One Day Ahead by Employing Hybrid Deep Learning Model; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
Xu, Y.; Huang, X.; Gao, Z.; Mohamed, M.A.; Jin, T. A novel electricity price forecasting approach based on multi-attention feature fusion model optimized by variational mode decomposition. Measurement 2025, 253, 117596. [Google Scholar] [CrossRef]
Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M. Time2vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [Google Scholar] [CrossRef]
Tan, Y.Q.; Shen, Y.X.; Yu, X.Y.; Lu, X. Day-ahead electricity price forecasting employing a novel hybrid frame of deep learning methods: A case study in NSW, Australia. Electr. Power Syst. Res. 2023, 220, 109300. [Google Scholar] [CrossRef]
Ghimire, S.; Deo, R.C.; Casillas-Pérez, D.; Sharma, E.; Salcedo-Sanz, S.; Barua, P.D.; Acharya, U.R. Half-hourly electricity price prediction with a hybrid convolution neural network-random vector functional link deep learning approach. Appl. Energy 2024, 374, 123920. [Google Scholar] [CrossRef]
Liu, C.; Cai, L.; Dalzell, G.; Mills, N. Large Language Model for Extreme Electricity Price Forecasting in the Australia Electricity Market; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Kang, K.; Wang, Q. A Transformer-Based Framework for Explainable Electricity Market Dynamics; IEEE: New York, NY, USA, 2025; pp. 665–669. [Google Scholar] [CrossRef]
Malyala, R.; Thattai, K.; Malik, A.; Ravishankar, J. Weather-Independent Forecasting for State-Wide Energy Markets Using Hybrid GPI-DSSM Model. In Proceedings of the 2024 4th International Conference on Smart City and Green Energy, Sydney, Australia, 10–13 December 2024. [Google Scholar] [CrossRef]
Gong, J.; Zhao, H.; Yue, Z.; Xu, J.; Zhang, C.; Cao, Y. Electricity market clearing price prediction based on mode decomposition and improved transformer. Int. Conf. Comput. Appl. Inf. Secur. (ICCAIS 2024) 2025, 13562, 120–130. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar] [CrossRef]
Han, L.; Wang, J. A regional electricity price prediction method based on Transformer and Graph Neural Networks. Alex. Eng. J. 2025, 128, 52–64. [Google Scholar] [CrossRef]
Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient surgery for multi-task learning. Adv. Neural Inf. Process. Syst. 2020, 33, 5824–5836. [Google Scholar] [CrossRef]
Meng, A.; Zhu, J.; Yan, B.; Yin, H. Day-ahead electricity price prediction in multi-price zones based on multi-view fusion spatio-temporal graph neural network. Appl. Energy 2024, 369, 123553. [Google Scholar] [CrossRef]
Li, J.; Wang, C.; Zhang, Y.; Wang, H. Temporal-aware deep reinforcement learning for energy storage bidding in energy and contingency reserve markets. IEEE Trans. Energy Mark. Policy Regul. 2024, 2, 392–406. [Google Scholar] [CrossRef]
Ghimire, S.; Nguyen-Huy, T.; AL-Musaylh, M.S.; Deo, R.C.; Casillas-Pérez, D.; Salcedo-Sanz, S. Integrated Multi-Head Self-Attention Transformer model for electricity demand prediction incorporating local climate variables. Energy AI 2023, 14, 100302. [Google Scholar] [CrossRef]
AEMO. NEMweb Data. Available online: https://nemweb.com.au/Reports/Archive/ (accessed on 20 November 2025).
AEMO. Aggregated Price and Demand Data. Available online: https://www.aemo.com.au/energy-systems/electricity/national-electricity-market-nem/data-nem/aggregated-data (accessed on 5 July 2025).
Zippenfenig, P. Open-Meteo: Historical and Forecast Weather Data. Available online: https://open-meteo.com/ (accessed on 20 June 2025).
Han, L.; Cribben, I.; Trück, S. Extremal dependence in Australian electricity markets. J. Commod. Mark. 2025, 39, 100476. [Google Scholar] [CrossRef]
ABS. Snapshot of New South Wales. Available online: https://www.abs.gov.au/articles/snapshot-nsw-2021 (accessed on 14 June 2025).
Quickiebites. Syed. File: Australia Color Map.svg. 2008. Available online: https://commons.wikimedia.org/wiki/File:Australia_Color_Map.svg (accessed on 20 November 2025).
Cornell, C.; Dinh, N.T.; Pourmousavi, S.A. A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market. Int. J. Forecast. 2024, 40, 1421–1437. [Google Scholar] [CrossRef]
Wang, Q.; Li, B.; Xiao, T.; Zhu, J.; Li, C.; Wong, D.F.; Chao, L.S. Learning deep transformer models for machine translation. arXiv 2019, arXiv:1906.01787. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
Abhimanyu, D.; Weihao, K.; Rajat, S.; Yichen, Z. A decoder-only foundation model for time-series forecasting. arXiv 2023, arXiv:2310.10688. [Google Scholar]
Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
Google. TimesFM 2.5 Git Repository. Available online: https://github.com/google-research/timesfm (accessed on 11 October 2025).
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Kandadi, T.; Shankarlingam, G. Drawbacks of Lstm Algorithm: A Case Study. Available SSRN 5080605, 2025. Available online: https://www.researchgate.net/publication/398082802_DRAWBACKS_OF_LSTM_ALGORITHM_A_CASE_STUDY?channel=doi&linkId=6929d661a130337711c01f71&showFulltext=true (accessed on 29 November 2025).
energy.gov.au. Australian Electricity Generation—Fuel Mix Calendar Year 2024. Available online: https://www.energy.gov.au/energy-data/australian-energy-statistics/data-charts/australian-electricity-generation-fuel-mix-calendar-year-2024 (accessed on 29 November 2025).

Figure 2. Data pipeline for model training. The complete data preparation workflow integrates AEMO RRP actuals, weather observations, weather forecasts, and AEMO pre-dispatch forecasts. Historical inputs are aggregated and aligned with engineered time-based features to form the encoder dataset, while forecast inputs undergo a separate aggregation process to construct the decoder dataset. Target values are produced through direct slicing of RRP actuals. The combined pipeline ensures temporally consistent, feature-rich inputs for transformer-based electricity price forecasting.

Figure 3. Pre-layer normalisation encoder–decoder transformer architecture with parallel decoding [25,41]. The encoder (left, purple box) processes historical inputs using multi-head self-attention and a position-wise feed-forward network, each wrapped with residual connections and layer normalisation (“Norm.”) applied before each block. The decoder (right, green box) processes known future inputs using self-attention and cross-attention over the encoder output, followed by a feed-forward layer. Predictions are produced via a final normalisation and linear projection layer. Positional encodings are added to both historical and future inputs. Parallel decoding is implemented as a non-autoregressive procedure in which the decoder generates all forecast horizons in parallel within one forward computation, eliminating the need to condition each timestep on earlier model outputs.

Figure 4. Transformer architectures evaluated. (a) Enc-dec transformer with parallel decoding, (b) Enc-dec transformer with autoregressive decoding [25], (c) Encoder-only transformer, (d) Decoder-only transformer with parallel decoding, and (e) Decoder-only transformer with autoregressive decoding.

Figure 5. Architectural overview of the comparison models used in this study, including a two-layer Long Short Term Memory network, the Temporal Fusion Transformer [45], TimesFM [44,46], and PatchTST [43]. Each model represents a distinct class of sequence modelling approaches ranging from recurrent neural networks to attention-based encoder architectures. These diagrams illustrate the structural differences in how each model processes historical inputs and generates multi step forecasts, providing context for the comparative evaluation presented in this work.

Figure 6. NSW1 normalised MAPE performance of the top six models showing the parallel decoder-only transformer, the enc-dec transformer, and the temporal fusion transformer with the best performance across all prediction horizons.

Figure 7. Mean Absolute Error (MAE) distribution of all models for NSW1 across the full evaluation period. Violin plots illustrate both the spread and central tendency of errors for each architecture, highlighting the superior accuracy and lower variability of the transformer-based models compared with LSTM, TFT, PatchTST, TimesFM, and the official NEM forecasts. The plot shows that the encoder–decoder and decoder-only transformer variants consistently achieved the lowest MAE values, while TimesFM exhibited substantially higher error dispersion.

Figure 8. NSW1 SHAP feature importance for the encoder and decoder features of the NSW1 transformer model. The results show that future-facing covariates, particularly the AEMO forecast price (F_RRP_NSW1) and regional predispatch demand and RRP forecasts, dominate model importance, accounting for the majority of predictive power. Historical signals and cyclical encodings contribute comparatively little.

Figure 9. NSW1 SHAP dependence plots show how the contribution of forecast NSW demand varies with different contextual variables. While demand exhibits a strong and largely monotonic influence on the model output, its contribution is modulated by calendar effects (workday), weather conditions (temperature and humidity), and time of day. The colour gradients highlight regime-dependent behaviour, indicating that demand sensitivity changes depending on operating context rather than following a single fixed relationship.

Figure 10. QLD1 normalised MAPE performance of the top six models showing the parallel decoder-only transformer, enc-dec transformer, and the auto-regressive transformers with the best performance across all prediction horizons.

Figure 11. VIC1 normalised MAPE performance of the top six models showing the temporal fusion transformer (TFT), enc-dec transformer and the parallel decoder-only transformer with the best performance across all prediction horizons.

Figure 12. Zoomed scatter plots of predictions vs. true values for the models across all horizons in the NSW1 region. Extreme price spikes have been truncated. The plots illustrate the distribution of errors across various price ranges, with models showing a wide variance, especially with respect to price predictions above $400 with most models reverting to the mean for such rare events.

Table 1. Summary of all input features used in this study, indicating whether each variable was incorporated as a historical encoder feature, a forecast decoder feature, or both. Data sources and brief descriptions are provided for clarity.

Feature	Historical	Forecast	Data Source	Description
RRP	Yes	Yes	AEMO	Regional reference price for each NEM region
Demand	Yes	Yes	AEMO	Total demand for each NEM region
Net Interchange	Yes	Yes	AEMO	The amount of energy flowing in or out of each NEM region
Temperature, cloud cover, humidity, wind speed	Yes	Yes	Open Meteo	The more applicable weather conditions for the capital city of the given NEM region
Workday	Yes	Yes	Engineered	Binary indicating if the day of week is a workday
Half-hour Cos/Sin	Yes	Yes	Engineered	Circular encoded half-hour time slot to indicate diurnal position
Day of Week Cos/Sin	Yes	Yes	Engineered	Circular encoded day of the week to indicate weekly position
Month Cos/Sin	Yes	Yes	Engineered	Circular encoded month of the year to indicate yearly seasonal cycle
Hours to delivery	No	Yes	Engineered	Hours until the forecast point

Table 2. Summary of transformer model variants evaluated in this study, including tiny, small, medium, large, encoder-only, and decoder-only configurations. ‘layers’ refers to the number of repeated layers of the encoder and decoder; ‘heads’ represent the number of self-attention heads that each attention block has; ‘in + out’ refers to the input and output sequence lengths; ‘d_model’ is the main embedding size of the layers of model, ‘ff_dim’ is the hidden layer size of the feed-forward network and dropout is the value used at each of the dropout layers.

Model	layers	heads	in + out len	d_model	ff_dim	dropout
Tiny, enc-dec	2	2	96 + 32	64	256	0.05
Small, enc-dec	3	4	96 + 32	128	512	0.05
Med, enc-dec	3	4	96 + 32	256	1024	0.05
Large, enc-dec	4	8	96 + 32	512	2048	0.05
Small, decoder only	3	4	0 + 32	128	512	0.05
Small, encoder only	3	4	96 + 0	128	512	0.05

Table 3. Hyperparameter values selected via a walk-forward validation procedure and their associated sweep ranges.

Hyperparameter	Sweep Range	Transformers and LSTM	TFT	PatchTST
Batch size	64–1024	128	128	128
Input sequence length	4–192	96 × 30 min periods	96 × 30 min periods	96 × 30 min periods
Scaler	Quantile, Robust, Standard	Quantile Transformer	Quantile Transformer	(used internal RevIN)
Output sequence length	32	32 × 30 m periods	32 × 30 m periods	32 × 30 m periods
Dropout	0.0–0.30	0.05	0.10	0.10
Initial learning rate	1 × 10⁻³–1 × 10⁻⁶	2 × 10⁻⁴	2 × 10⁻⁴	1 × 10⁻⁵
Global Clipnorm	0.5–2.5	2.0	-	-
Optimiser	-	Adam (β = 0.9, β₂ = 0.999, ε = 1 × 10⁻⁹)	Adam	Adam (clipnorm = 0.01)
Loss	-	Huber (δ = 0.8)	Quantile [0.1,0.5,0.9]	MSE
Max. epochs	20	20	20	20
Reduce LR on Plateau	Patience 2–7	Factor = 0.5, Patience = 2	Factor = 0.5, Patience = 2	-
Early stop	Patience 5–10	Patience = 5, Monitor = val_loss	Patience = 5, Monitor = val_loss	-
Patch and Stride	12–24 and 6–12	-	-	24 and 6

Table 4. NSW1 region comparative model normalised MAPE (%) results. The table reports percentage errors at fixed forecast horizons (2, 4, 8, 12, and 16 h ahead), enabling direct comparison of short-, medium-, and long-range predictive performance. The listed transformer models are all of the “small” size. Bold figures represent the best score of the prediction horizon.

Model	2 h	4 h	8 h	12 h	16 h
Enc-Dec Transformer	37.9	37.9	38.2	40.1	40.5
Dec Transformer	33.6	34.2	36.8	38.4	39.2
Enc Transformer	42.7	44.0	45.0	45.6	46.6
Enc-AR-Dec Transformer	40.8	42.0	42.7	43.0	43.0
AR-Dec Transformer	38.7	38.5	39.5	39.5	40.2
LSTM	38.2	40.1	42.0	42.5	44.4
TFT	36.3	37.7	38.4	39.5	40.8
PatchTST	59.4	64.7	66.4	65.0	63.6
TimesFM	48.3	53.3	59.4	60.2	57.8
NEM	65.6	82.7	116.0	109.6	114.6

Table 5. QLD1 NEM region results for all small-model architectures, showing normalised MAPE across 2, 4, 8, 12, and 16 h forecast horizons. All models were fully retrained on QLD1 targets using the same input features and training configuration as the NSW1 experiments. Bold figures represent the best score of the prediction horizon.

Model	2 h	4 h	8 h	12 h	16 h
Enc-Dec Transformer	30.0	31.3	31.7	32.0	32.7
Dec Transformer	29.2	30.6	31.2	31.9	32.2
Enc Transformer	42.2	43.6	45.3	45.0	45.1
Enc-AR-Dec Transformer	30.8	32.8	33.9	34.0	35.0
AR-Dec Transformer	31.1	32.3	33.4	34.2	35.5
LSTM	30.3	32.0	34.5	34.9	35.5
TFT	29.3	32.1	35.3	37.2	36.5
PatchTST	60.3	77.5	97.6	94.3	99.0
TimesFM	62.1	82.5	104.3	99.0	105.6
NEM	57.0	79.1	120.4	124.7	121.9

Table 6. VIC1 NEM region results for all small-model architectures, showing normalised MAPE across 2, 4, 8, 12, and 16 h forecast horizons. Each model was retrained on VIC1 targets to evaluate performance consistency and model robustness across different NEM regions. Bold figures represent the best score of the prediction horizon.

Model	2 h	4 h	8 h	12 h	16 h
Enc-Dec Transformer	38.0	38.8	39.2	40.5	41.3
Dec Transformer	37.3	38.9	40.3	40.8	41.7
Enc Transformer	61.3	63.0	64.7	64.7	65.4
Enc-AR-Dec Transformer	38.7	41.4	43.2	43.8	46.0
AR-Dec Transformer	38.9	39.9	41.4	43.0	45.9
LSTM	39.7	42.1	44.1	44.8	45.2
TFT	36.3	37.7	37.7	39.6	39.6
PatchTST	82.9	95.1	105.1	101.6	107.7
TimesFM	85.2	100.3	112.3	107.2	114.5
NEM	35.7	41.0	60.3	59.3	61.4

Table 7. NSW1 model-size comparison for the encoder–decoder transformer, showing normalised MAPE across 2, 4, 8, 12, and 16 h horizons. Results indicate that the small model provides a good balance between accuracy and capacity for this dataset. Bold figures represent the best score of the prediction horizon.

Model ¹	2 h	4 h	8 h	12 h	16 h
Tiny	37.5	38.5	38.5	39.0	39.8
Small	37.9	37.9	38.2	40.1	40.5
Med	37.6	39.8	37.8	40.9	40.7
Large	38.3	40.2	40.6	41.2	41.0

¹ Encoder–decoder transformer model sizes are defined in Section 3.3.

Table 8. Diebold–Mariano test results comparing each model’s forecasts with the official NEM forecasts for the NSW1 region. All models show statistically significant improvements over the NEM benchmark at the 95% confidence level.

Model	Diebold Mariano-Stat	p-Value
Enc-Dec Transformer	−6.85	7.6 × 10⁻¹²
D × 10c Transformer	−7.13	1.1 × 10⁻¹²
Enc Transformer	−6.27	7.1 × 10⁻¹⁰
Enc-AR-Dec Transformer	−6.52	7.5 × 10⁻¹¹
AR-Dec Transformer	−6.84	8.6 × 10⁻¹²
LSTM	−6.57	5.4 × 10⁻¹¹
TFT	−6.90	5.6 × 10⁻¹²
PatchTST	−4.03	5.6 × 10⁻⁵
TimesFM	−4.92	9.0 × 10⁻⁷

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sinclair, M.; Shepley, A.J.; Hajati, F. Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market. Appl. Sci. 2026, 16, 75. https://doi.org/10.3390/app16010075

AMA Style

Sinclair M, Shepley AJ, Hajati F. Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market. Applied Sciences. 2026; 16(1):75. https://doi.org/10.3390/app16010075

Chicago/Turabian Style

Sinclair, Mark, Andrew J. Shepley, and Farshid Hajati. 2026. "Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market" Applied Sciences 16, no. 1: 75. https://doi.org/10.3390/app16010075

APA Style

Sinclair, M., Shepley, A. J., & Hajati, F. (2026). Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market. Applied Sciences, 16(1), 75. https://doi.org/10.3390/app16010075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market

Featured Application

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Data

3.2. Preprocessing

3.3. Model Architectures

3.4. Training Procedure and Hyperparameter Tuning

3.5. Evaluation Framework

4. Results

4.1. NSW1 Region Results

4.2. SHAP Results

4.3. Region Generalisation Results

4.4. Transformer Size Variant Results

5. Discussion

5.1. NSW1 Region Performance

5.2. Generalisation to Other NEM Regions

5.3. Transformer Variants

5.4. Assumptions and Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Full Results Table

Appendix A.2. MAE Results for NSW1 Region

Appendix A.3. Additional SHAPley Interaction Plots

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI