Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints

Pavlov, Kostiantyn; Pavlova, Olena; Wołowiec, Tomasz; Slobodian, Svitlana; Tymchyshak, Andriy; Vlasenko, Tetiana

doi:10.3390/en18143690

Open AccessArticle

Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints

by

Kostiantyn Pavlov

^1,*

,

Olena Pavlova

^1,2

,

Tomasz Wołowiec

³

,

Svitlana Slobodian

⁴

,

Andriy Tymchyshak

¹

and

Tetiana Vlasenko

^5,6

¹

Faculty of Economics and Management, Lesya Ukrainka Volyn National University, Voli Ave, 13, 43025 Lutsk, Ukraine

²

Faculty of Management, AGH University of Krakow, A. Mickiewicza Ave. 30, 30-059 Kraków, Poland

³

Science and International Cooperation of the Lublin Academy of WSEI, Projektowa 4, 20-209 Lublin, Poland

⁴

Faculty of Mathematical and Computer Science, Vasyl Stefanyk Precarpathian National University, 57 Shevchenka Str., 76018 Ivano-Frankivsk, Ukraine

⁵

Department of Management, Academy of Silesia, Ul. Rolna 43, 40-555 Katowice, Poland

⁶

Enterprise Economics and Business Organization Department, Simon Kuznets Kharkiv National University of Economics, Nauky Ave., 9-A, 61166 Kharkiv, Ukraine

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(14), 3690; https://doi.org/10.3390/en18143690

Submission received: 18 June 2025 / Revised: 8 July 2025 / Accepted: 10 July 2025 / Published: 12 July 2025

Download

Browse Figures

Versions Notes

Abstract

Accurate gas consumption forecasting is critical for modern energy systems due to complex consumer behavior and regulatory requirements. Deep neural networks (DNNs), such as Seq2Seq with attention, TiDE, and Temporal Fusion Transformers, are promising for modeling complex temporal relationships and non-linear dependencies. This study compares state-of-the-art architectures using real-world data from over 100,000 consumers to determine their practical viability for forecasting gas consumption under operational and regulatory conditions. Particular attention is paid to the impact of data quality, feature attribution, and model reliability on performance. The main use cases for natural gas consumption forecasting are tariff setting by regulators and system balancing for suppliers and operators. The study used monthly natural gas consumption data from 105,527 households in the Volyn region of Ukraine from January 2019 to April 2023 and meteorological data on average monthly air temperature. Missing values were replaced with zeros or imputed using seasonal imputation and the K-nearest neighbors. The results showed that previous consumption is the dominant feature for all models, confirming their autoregressive origin and the high importance of historical data. Temperature and category were identified as supporting features. Improvised data consistently improved the performance of all models. Seq2SeqPlus showed high accuracy, TiDE was the most stable, and TFT offered flexibility and interpretability. Implementing these models requires careful integration with data management, regulatory frameworks, and operational workflows.

Keywords:

natural gas consumption and demand forecasting; neural networks; Seq2Seq; TiDE; Temporal Fusion Transformer; regulatory restrictions; energy systems; tariff policy; capacity balancing and reservation; machine learning; long short-term memory; time series analysis; SHAP processing of missing data

1. Introduction

Accurate and robust forecasting of gas consumption has become increasingly critical in the context of modern energy systems, in which supply reliability, market efficiency, and environmental responsibility intersect.

With the growing complexity of consumption behavior, influenced by factors such as urbanization, climate variability, and diversified consumer categories, traditional forecasting approaches often fall short in capturing the underlying dynamics of energy demand.

In response to these challenges, deep neural network (DNN) models have emerged as a promising alternative, offering the capacity to model complex temporal relationships and non-linear dependencies. Architectures such as Seq2Seq with attention, TiDE, and Temporal Fusion Transformers (TFT) enable learning from rich, multi-dimensional datasets that include historical consumption, external variables like temperature, and categorical consumer attributes.

The potential of DNNs extends beyond technical innovation; it holds tangible value in meeting the regulatory and operational demands of the gas sector. Regulators increasingly require energy providers to deliver accurate, transparent, and timely forecasts that support network stability, market transparency, and environmental compliance.

From forecasting aggregated demand for capacity planning to ensuring adherence to balancing responsibilities and emission targets, DNN-based models can provide both the precision and interpretability needed for modern regulatory frameworks.

To address the challenges of large-scale, regulatory relevant gas consumption forecasting, we selected three deep learning architectures—Seq2SeqPlus, TiDE, and Temporal Fusion Transformer (TFT)—based on their proven ability to handle multivariate time series, support structured inputs, and offer model transparency. Unlike earlier models such as DeepAR, N-BEATS, or Informer, which either focus on probabilistic autoregression, pure trend decomposition, or long-range attention alone, the chosen architectures provide a balance of temporal structure modeling, scalability across diverse consumers, and interpretability through attention mechanisms or SHAP-based feature attribution. These characteristics make them especially suitable for real-world deployment under data quality constraints and regulatory requirements.

This study evaluates and compares these three DNN architectures using a large-scale, real-world dataset of over 100,000 consumers, aiming to determine their practical viability for gas consumption forecasting in operational and regulatory settings.

Focus is placed on how data quality, feature attribution and model robustness influence performance—providing actionable insights for real-world deployment.

The necessity of mid- and long-term forecasting of natural gas consumption is of crucial importance in the current conditions of Ukraine’s natural gas market as an integral part of the European Energy market. In this context, accurate forecasting ensures effective interaction among market participants and helps reduce operational and transaction costs as well as mitigates the risks of negative regulatory effects [1].

In particular, the predicted consumption and distribution volumes of natural gas are used by the state regulatory authority—The National Energy and Utilities Regulatory Commission when setting tariffs for gas distribution system operators and the gas transmission system operator.

Currently, existing methodologies involve using historical data for tariff calculations, with periodic adjustments based on actual data from previous periods (mostly a calendar year). This approach creates discriminatory effects on consumers and leads to cash flow gaps for DSO.

Additionally, forecasting is essential for natural gas suppliers and traders to balance and reserve capacities. Furthermore, inaccurate forecasting increases suppliers’ costs due to penalties and the need to purchase additional gas volumes on the spot market [2].

Recent advancements in gas consumption forecasting leverage a variety of machine learning (ML) techniques and hybrid models to enhance predictive accuracy. These approaches utilize historical consumption data, meteorological factors, and even social variables to create robust forecasting models. The following sections outline the key methodologies currently employed in the field.

Machine Learning Techniques:

Deep Learning Models: Techniques such as Long Short-Term Memory (LSTM) and Deep Neural Networks (DNN) have shown significant promise. For instance, a DNN incorporating social factors outperformed traditional models in forecasting natural gas consumption in Greece [3].

Hybrid Models: Combining statistical methods with ML, such as Facebook’s Prophet and Holt–Winters Method, has improved accuracy in predicting natural gas demand [4].

Change Point Detection:

Dynamic Adaptation—the integration of change point detection mechanisms allows models to adapt to shifts in consumption patterns, enhancing forecasting reliability in real-time scenarios [5].

Comparative Studies:

Model Evaluation—a comprehensive review of various forecasting strategies highlights the effectiveness of hybrid models and the importance of data decomposition methods in improving prediction accuracy [6].

Fan et al. proposed a deep reinforcement learning framework integrating demand forecasting and dynamic pricing for natural gas pipeline networks, optimizing system performance under physical constraints. Their method highlights the potential of combining deep learning and decision-making for demand response in complex gas systems [7].

While these advanced techniques significantly enhance forecasting capabilities, challenges remain, particularly in data privacy and the interpretability of complex models. Addressing these issues will be crucial for the future development of gas consumption forecasting methodologies.

2. Actual Regulatory Use Cases of Consumption Forecasting

There are several major use cases concerning a proper forecasting of natural gas consumption engaging all the participants of the gas market—consumers, system operators, suppliers, and state regulatory authority. Let us consider them deeply. The first one is Forecasting for Tariff Setting. The National Energy and Utilities Regulatory Commission (hereinafter—NEURC) uses gas consumption forecasts to determine fair and economically justified tariffs for gas distribution and transportation.

Tariffs for gas distribution system operators (DSOs) and the gas transmission system operator (TSO) are set for a period of 1 to 5 years based on predicted gas consumption and transmission capacity volumes. In this case, precious long-term forecasts of gas consumption help estimate expected revenues and costs for network operators, ensuring that tariffs cover operational expenses while maintaining affordability for consumers [8,9].

But there are certain challenges with the current approach, such as historical data dependency, delayed adjustments, and consumer inequality. The current methodology relies on past consumption data to determine tariffs, with periodic adjustments based on actual figures. This can be problematic in times of fluctuating demand.

If actual consumption deviates from forecasts, tariff corrections may take a full calendar year, leading to financial imbalances for operators. In addition, inaccurate forecasting can result in some consumer groups overpaying or underpaying, creating a discriminatory effect on different types of consumers.

The second use case considers Forecasting for Balancing and Capacity Reservation due to The Gas Transmission System Code requirements [10]. Suppliers and network operators use day- and month-ahead gas consumption forecasts to ensure the gas system remains balanced and to optimize capacity reservation. Natural gas supply must match demand in real-time to maintain stable system pressure and prevent shortages or excess gas buildup, meanwhile suppliers must forecast demand accurately to avoid imbalances, which could lead to financial penalties.

While Capacity Reservation is maintained, gas suppliers must reserve transmission and distribution capacities in advance based on their expected demand. Incorrect forecasts may result in overbooking (leading to unnecessary costs) or underbooking (causing supply shortages and urgent, expensive spot market purchases).

Financial and operational risks may occur in case of poor forecasting. If suppliers do not accurately predict consumption, they may face penalties for imbalances. Unexpected demand surges require suppliers to purchase additional gas on the spot market, which often has significantly higher prices than long-term contracts. Inaccurate forecasts can create cash flow gaps, where suppliers either have excess gas, they cannot sell or face sudden high costs to meet unexpected demand.

Therefore, accurate gas consumption forecasting is critical for both regulatory tariff-setting and market-based balancing operations. For regulatory authorities and consumers, it ensures stable, fair tariffs that reflect actual market conditions. For suppliers, it minimizes costs, prevents penalties, and improves financial planning. Now, let us review up-to-date forecasting solutions applied to the subject of our research.

Our current study aims to find the best extant solutions to satisfy both short- and long-term forecasting demands. The key point of the possible approaches we consider is data, and its availability, robustness, and relevance. The volumes of data available to gas market participants and the state regulatory authority are enormous and constantly growing, although only a small part of them can be used for forecasting purposes (Figure 1).

The TSOUA, as the authorized operator of the Information Platform, possesses large amounts of data on all the consumers of Ukraine—in particular, their territorial location, consumption volumes, and affiliation with DSO and gas suppliers [11].

In turn, DSO and suppliers have more discrete data, but only regarding the consumers to whom they provide their services. At the same time, regulatory factors, such as the requirements of regulatory legal acts (for example, the methodology for calculating the volume of gas consumed in the absence of meter readings or existing technological limitations, or, the low level of coverage of household consumers with remote reading meters and the inability of DSO controllers to cover all consumers) cause the incompleteness of this data [12].

The NERC consolidates these data through regulatory reporting submitted by market participants, forming a holistic picture of the regulated market, while making the information more general and publicly available. Meanwhile, current legislative restrictions, regarding the protection of personal data and commercial concerns, may further limit their use for forecasting purposes, which is to be mitigated by amending existing regulations. That is, the corresponding models must consider this incompleteness of the data, or it is necessary to additionally apply specified data wrangling and preprocessing solutions, which can lead to a significant increase in time and computational power.

The structure of consumer types and their share in the total consumption volume differs significantly in certain regions of Ukraine—commercial consumers prevail in the East and Center, and household consumers in the West and South.

This may be another significant factor that may influence the choice of forecasting approaches. Commercial consumers demonstrate more stable patterns of consumer behavior with low seasonality, while the patterns of household and utility consumers, on the contrary, have a pronounced seasonality and dependence on environmental temperature. Moreover, within households, separate groups whose consumption patterns are extremely different can be distinguished. This depends primarily on the purpose of consumption (home heating, water heating, cooking), volumes of consumption and housing floor area.

Under the conditions of state regulation of prices and tariffs for household consumers, which is currently applied in Ukraine, the natural gas cost factor has a negligible statistical impact on their consumption level, and therefore we will not use it in this study [13].

Summarizing the above, we can outline the framework of our requirements for predictive approaches—the ability to process large volumes of raw incomplete data, detect and consider seasonal trends, as well as accept additional features—environmental temperature, category, etc.

Moreover, we need a single predictive model that considers all of more than 100,000 consumers’ unique time series simultaneously attempting to capture the core patterns that govern the series, thereby mitigating the potential noise that each series might introduce. At the output, the solutions should provide short- and long-term forecasts of consumption volumes both in terms of distinct consumers, consumer categories and, in general, for certain regions. Since the task is quite complex, in our research we will narrow the scope and focus on the most essential part of it—household consumption forecasting.

3. Materials and Methods

Currently, time series forecasting problems can be solved using a wide range of approaches. Among them, we can mention classical statistical approaches—ARIMA, which is effective for short-term forecasting when consumption data follows a trend and seasonality and SARIMA—an extension of ARIMA that accounts for seasonal patterns in gas demand. Exponential Smoothing (Holt–Winters method), which is useful for capturing trends and seasonal variations, regression models—Multiple Linear Regression (MLR) that relates gas consumption to external variables like temperature, industrial activity, and population, as well as Generalized Additive Models (GAMs) which is more flexible than linear regression, allowing non-linear relationships between predictors—can be added to the list.

These methods have been widely used due to their interpretability and relatively low computational requirements and deliver an acceptable level of precision. A significant limitation of these methods is the inability to train global models that allow considering the behavior of many consumers simultaneously, and, therefore, they are unsuitable for achieving our goals.

In the last decade, due to the impressive development of computing power, completely new approaches have appeared, which are based on machine learning and artificial intelligence.

Support Vector Machines (SVMs), e.g., are useful for medium-term forecasting, especially when relationships between variables are complex, Random Forest and Gradient Boosting Trees (XGBoost 3.0.2, Tianqi Chen, University of Washington; LightGBM 4.6.0, Microsoft Research Asia, Redmond, Washington, USA) handle non-linear relationships well and can incorporate many external variables, while Feedforward Neural Networks (FNNs) are effective for learning complex patterns but require significant data and Long Short-Term Memory Networks (LSTMs) and Gated Recurrent Units (GRUs) are ideal for sequential and time-series forecasting. But the most cutting edge solutions are based on the concept of Deep Neural Networks [14,15].

These methods require huge amounts of data and immense computational facilities, but in return a higher level of prediction accuracy can be granted.

Deep neural network methods in gas consumption forecasting. So, let us get closer to the most up-to-date realm of time series forecasting approaches, deep neural network models, in particular Time-series Dense Encoder, Temporal Fusion Transformer, and Sequence to Sequence plus Attention.

The evolution of deep learning models for time series forecasting reflects a gradual shift from sequence modeling roots in natural language processing to highly specialized architectures tailored for temporal data. Sequence-to-Sequence (Seq2Seq) models were initially developed for machine translation and later adapted to time series forecasting. By incorporating the attention mechanism, these models overcame limitations of fixed-length context vectors, allowing them to selectively focus on relevant time steps—an innovation that improved their ability to handle long-term dependencies.

Building on these foundations, the Temporal Fusion Transformer (TFT) was introduced as a purpose-built architecture for interpretable multivariate time series forecasting. It fully integrates attention mechanisms—not only temporally, but also across input features—combined with gating and variable selection layers to enhance both performance and interpretability. TFT represents a significant leap in leveraging attention for structured time-series tasks.

In contrast, TiDE (Time-series Dense Encoder) emerged more recently as a minimalist, fully feedforward alternative to attention-based models. Eschewing traditional sequence-to-sequence and attention architectures, TiDE employs learned temporal embeddings and multi-layer perceptrons (MLPs) to capture time-dependent patterns.

It is designed to be computationally efficient and particularly well-suited for long-horizon forecasting, showing that competitive accuracy can be achieved without explicit recurrence or attention.

Time-series Dense Encoder (TiDE) [16]

The deep learning model TiDE is designed to address the limitations of both linear models and Transformer-based approaches in long-term time-series forecasting [17].

While recent work demonstrated that simple linear models could outperform Transformers on certain benchmarks, linear methods fail to capture non-linear dependencies or leverage covariates effectively. TiDE bridges this gap by introducing a dense Multi-Layer Perceptron (MLP)-based encoder–decoder framework that handles non-linear patterns while maintaining scalability. TiDE encodes the past of a time-series along with covariates using dense MLP’s and then decodes the encoded time-series along with future covariates.

Key Innovations. TiDE employs channel-independent processing, where each

i

-th time-series is modeled separately using its past observations

y_{1 : L}^{(i)}

, dynamic covariates

x_{1 : L + H}^{(i)}

, and static attributes

s^{(i)}

, while sharing global weights across the dataset, i.e.:

f : ({\{y_{1 : L}^{(i)}\}}_{i \in I}, {\{x_{1 : L + H}^{(i)}\}}_{i \in I}, {\{s^{(i)}\}}_{i \in I}) \to {\{{\hat{y}}_{L + 1 : L + H}^{(i)}\}}_{i \in I},

where

i \in I,

I

denotes the set of unique entities in a given time-series dataset,

f

is a forecasting model.

The architecture relies on residual MLP blocks, which consist of a ReLU-activated hidden layer, a linear skip connection, and dropout with layer normalization for stable training.

The model operates in two phases: an encoding phase that compresses historical data and covariates into a low-dimensional latent representation, and a decoding phase that maps this representation to future predictions using projected future covariates. This design ensures efficient non-linear modeling while maintaining interpretability and scalability.

TiDE Architecture. Encoding Stage. Feature Projection. Dynamic covariates

x_{t}^{(i)}

of time-series

i

at time

t

are compressed via a residual block to reduced dimension

\tilde{r} ≪ r

:

{\tilde{x}}_{t}^{(i)} = R e s i d u a l B l o c k (x_{t}^{(i)}), {\tilde{x}}_{t}^{(i)} \in R^{\tilde{r}} .

This avoids the high dimensionality of flattened raw covariates.

Dense Encoder. The encoder stacks projected covariates, static attributes, and past observations, processing them through

n_{e}

residual blocks:

e^{(i)} = E n c o d e r (y_{1 : L}^{(i)}; {\tilde{x}}_{1 : L + H}^{(i)}; s^{(i)}) .

Decoding Stage. Dense Decoder. The latent representation

e^{(i)}

is transformed through

n_{d}

residual blocks into a tensor

D^{(i)} \in R^{p \times H}

, where each column

d_{t}^{(i)}

corresponds to a future time step:

D^{(i)} = R e s h a p e (D e c o d e r (e^{(i)})) .

Temporal Decoder. For each horizon step

t

, a residual block combines

d_{t}^{(i)}

with projected future covariates

{\tilde{x}}_{L + t}^{(i)}

:

{\hat{y}}_{L + t}^{(i)} = T e m p o r a l D e c o d e r (d_{t}^{(i)}; {\tilde{x}}_{L + t}^{(i)}), t \in \{1, 2, \dots, H\} .

This “highway” connection ensures direct covariate influence.

Global Residual Connection

A linear projection of the look-back window

y_{1 : L}^{(i)}

is added to the predictions, ensuring the model subsumes linear baselines.

Training and Evaluation. TiDE is trained using mini-batch gradient descent with root mean square error (RMSE) loss, where each batch contains multiple time-series segments consisting of look-back windows and their corresponding forecast horizons, allowing overlapping training sequences for comprehensive learning. The model’s performance is evaluated through rolling-window validation, testing all possible consecutive (look-back, horizon) pairs in the test set to thoroughly assess forecasting accuracy. This evaluation approach, consistent with established time-series forecasting practices, can also be applied to a validation set for hyperparameter optimization and model selection.

Temporal Fusion Transformer (TFT) [18].

The Temporal Fusion Transformer (TFT) is designed with specialized components to handle diverse input types (static, known, and observed) for robust time-series forecasting. It adopts quantile regression to multi-horizon forecasting. Each quantile forecast takes the form

{\hat{y}}^{(i)} (q, t, H) = f_{q} (H, y_{t - k : t}^{(i)}, z_{t - k : t}^{(i)}, x_{t - k : t + H}^{(i)}, s^{(i)}), i \in I,

where

{\hat{y}}^{(i)} (q, t, H)

is predicted q-th quantile for entity

i

at time

t + H

,

f_{q} (\cdot)

is a forecasting model,

H

is forecast horizon,

y_{t - k : t}^{(i)}

are past targets,

z_{t - k : t}^{(i)}

are past unknown observed inputs,

x_{t - k : t + H}^{(i)}

are past and future known inputs,

s^{(i)}

are static covariates, and

I

is a set of unique entities in a given time-series dataset. The model outputs probabilistic forecasts through quantile predictions (10th, 50th, and 90th percentiles), providing both point estimates and uncertainty quantification.

Key Innovations. TFT hybrid architecture strategically combines long short-term memory (LSTM), which excel at capturing local temporal patterns, with Transformer components that model long-range dependencies, creating a powerful framework for complex forecasting tasks.

For enhanced interpretability, TFT employs variable selection networks to identify important features and multi-head attention to reveal meaningful temporal relationships like seasonality trends. These innovations collectively enable TFT to handle complex forecasting tasks while maintaining computational efficiency and model transparency.

TFT Architecture. TFT employs gating mechanisms to dynamically adjust network complexity, variable selection networks to prioritize relevant features at each step, and static encoders to integrate time-invariant metadata. For temporal processing, TFT combines a sequence-to-sequence layer for local patterns with an interpretable multi-head attention block for long-range dependencies.

Gating mechanisms. To dynamically assess the importance of features, TFT employs Gated Residual Networks (GRNs). The GRN operates on an input vector a and an optional context vector c through the following formulation:

${G R N}_{ω} (a, c) = L a y e r N o r m (a + {G L U}_{ω} (η_{1})),$

$η_{1} = W_{1, ω} η_{2} + b_{1, ω}, η_{2} = E L U (W_{2, ω} a + W_{3, ω} c + b_{2, ω}),$

where ELU is the Exponential Linear Unit activation function, $η_{1}, η_{2} \in R^{d_{m o d e l}}$ are intermediate layers, LayerNorm is standard layer normalization, and ω is an index to denote weight sharing [19,20]; Gated Linear Units (GLU) for input $η, η \in R^{d_{m o d e l}},$ take the form:

${G L U}_{ω} (η) = σ (W_{4, ω} η + b_{4, ω}) ⨀ (W_{4, ω} η + b_{4, ω}),$

where $σ (\cdot)$ is the sigmoid activation function, $W_{(\cdot)} \in R^{d_{m o d e l} \times d_{m o d e l}}$ , $b_{(\cdot)} \in R^{d_{m o d e l}}$ are the weights and biases, $⨀$ is the element-wise Hadamard product, and $d_{m o d e l}$ is the hidden state size.
Variable selection networks TFT determines the importance of each input variable through its variable selection networks, which analyze both static features and time-varying inputs for every prediction. This process reveals which factors most influence forecasts and filters out irrelevant or noisy data that could reduce accuracy.
The model computes variable selection weights by processing both the flattened vector $Ξ_{t}$ of all past inputs at time t and an external context vector $c_{s}$ through a GRN, followed by a Softmax normalization:

$υ_{x_{t}} = S o f t m a x ({G R N}_{x_{t}} (Ξ_{t}, c_{s})),$

where $x_{t} \in R^{m_{x}}$ represents the vector of variable importance weights, $c_{s}$ is the static context vector from the static covariate encoder. For static variables, $c_{s}$ is excluded since they already contain static information.
Each feature $ξ_{t}^{(j)}$ undergoes additional non-linear processing at each time step via its own GRN:

${\tilde{ξ}}_{t}^{(j)} = {G R N}_{{\tilde{ξ}}^{(j)}} (ξ_{t}^{(j)}) .$

The processed features are then combined using the selection weights:

${\tilde{ξ}}^{(j)} = \sum_{j = 1}^{m_{x}} υ_{x_{t}}^{(j)} {\tilde{ξ}}_{t}^{(j)},$

where $υ_{x_{t}}^{(j)}$ is the $j$ -th element of the weight vector $υ_{x_{t}}$ .
Static covariate (attribute feature) encoders. TFT uses dedicated GRN encoders to transform static metadata into context vectors that guide temporal variable selection, local feature processing, and static–temporal fusion in the decoder.
Multi-head attention mechanism. Attention mechanisms scale values based on relationships between keys and queries in the following way:

$A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}}) V,$

where Q (Query) represents the current focus of the attention mechanism, K (Keys) encodes the content of all time steps, V (Values) contains actual information to aggregate, $d_{k}$ is a scaling factor for stable gradients.
The multi-head attention approach improves upon standard attention by employing parallel attention heads that each focus on distinct feature representations. The outputs of different heads are then combined via concatenation [21].
Temporal fusion decoder uses different layers.
Quantile outputs. TFT produces prediction intervals alongside point forecasts by directly outputting multiple percentiles (e.g., 10th, 50th, 90th) at each time step, computed via a linear transformation of the temporal fusion decoder’s output.

Training and Evaluation. TFT is trained using a quantile loss function that jointly optimizes prediction percentiles, enabling probabilistic forecasting with uncertainty intervals. During evaluation, TFT employs rolling-window validation to assess performance across all forecast horizons, while attention weights and variable selection networks provide interpretable insights into feature importance and temporal patterns.

Compared to TiDE’s RMSE-based training, TFT’s quantile loss offers richer uncertainty quantification but requires more computation due to its LSTM and attention components. The model’s performance is measured through horizon-specific metrics including quantile coverage, mean absolute error (MAE) for median predictions, and analysis of attention patterns for temporal dependencies.

Sequence-to-Sequence Plus Attention (Seq2SeqPlus) [22,23].

Key Innovations. Seq2SeqPlus introduces critical enhancements over the traditional Seq2Seq model, primarily through the attention mechanism, which allows dynamic focus on relevant input tokens during decoding, improving handling of long sequences [24].

Transformer architecture replaces RNNs (Recurrent Neural Network) with self-attention, enabling parallel processing and capturing long-range dependencies more effectively.

Seq2SeqPlus Architecture. Seq2SeqPlus improves upon Seq2Seq by integrating attention, Transformer blocks, and hybrid mechanisms. It consists of two main components—Encoder Block and Decoder Block.

Encoder Block. The encoder processes sequential input data to generate contextual annotations. We use a bidirectional RNN (BiRNN) to capture both past and future temporal dependencies within a fixed window. Forward RNN processes the time-series

x_{j}

chronologically (from

x_{1}

to

x_{T_{x}}

), producing hidden states

{\vec{h}}_{j}

that encode historical trends:

{\vec{h}}_{j} = {R N N}_{f o r w a r d} (x_{j}, {\vec{h}}_{j - 1}) .

Backward RNN processes the time-series

x_{j}

in reverse (from

x_{T_{x}}

to

x_{1}

), generating hidden states

{h^{\leftarrow}}_{j}

to incorporate future context within the window:

{h^{\leftarrow}}_{j} = {R N N}_{b a c k w a r d} (x_{j}, {h^{\leftarrow}}_{j + 1}) .

These states are concatenated into an annotation vector

h_{j} = [{\vec{h}}_{j}; {h^{\leftarrow}}_{j}]

, which summarizes both past and future context around

x_{j}

. The annotations

{\{h_{j}\}}_{j = 1}^{T_{x}}

are later used by the decoder to compute dynamic attention weights.

Decoder Block. The decoder generates future predictions step-by-step, conditioned on both past decoder states and relevant historical patterns identified by the attention mechanism. At each step

i

, the decoder computes the conditional probability of the next value

y_{i}

as:

p (y_{1}, \dots, y_{i - 1}, x) = g (y_{i - 1}, s_{i}, c_{i}),

where

s_{i}

is the decoder’s hidden state,

y_{i - 1}

is the previous prediction, and

c_{i}

is a time-dependent context vector encoding historical observations attended.

The decoder state

s_{i}

updates recursively using:

s_{i} = f (s_{i - 1}, y_{i - 1}, c_{i}),

where

f

is a recurrent unit (LSTM). The context vector

c_{i}

is a weighted sum of encoder annotations

h_{j}

:

c_{i} = \sum_{j = 1}^{T_{x}} {α_{i j} h}_{j},

with weights

α_{i j}

computed via a soft attention mechanism:

α_{i j} = \frac{e x p (e_{i j})}{\sum_{k = 1}^{T_{x}} e x p (e_{i k})},

here,

e_{i j}

scores how well the context window around

h_{j}

aligns with the current forecast step

i

, parametrized by a feedforward network

α

.

Model performance evaluation

In this study, model evaluation focused on well-established error-based performance metrics such as MAE, RMSE, R², and WAPE, which directly measure the accuracy and reliability of forecasts in practical, interpretable terms [25]. These metrics are widely accepted in both academic and applied forecasting domains—particularly for large-scale deep learning models—due to their robustness and relevance to operational decision-making. We deliberately did not apply traditional statistical significance tests (e.g., p-values for forecasted outcomes), as such methods are generally not standard in performance validation for time series models, especially when predictions are autocorrelated and models operate on overlapping sequences. Nonetheless, we acknowledge the potential value of statistical testing in post-forecast residual analysis, and future work may integrate runs-based tests, Geary’s Test or Two-dimensional Bit-sequence Analysis to further assess structural forecast reliability and distributional alignment over time [26,27].

4. Results

The primary raw data we used for research were contained in two original datasets. The first one consisted of 55 columns containing billing data of 105,527 households in Volyn region of Ukraine on monthly natural gas consumption from January 2019 to April 2023 (52 periods totally) in cubic meters. The dataset columns were as follows:

Column 1 “ID”—household ID;

Column 2 “Category”—household categories prescribed due to Gas Distribution System Code depending on the purpose of consumption (home heating, water heating, cooking), volumes of consumption and housing floor area [28];

Column 3 “GDS”—Household affiliation with a gas distribution station (39 GDS totally);

Columns 4–55 “Consumption”—Monthly volumes of consumption from January 2019 to April 2023 in cubic meters.

We performed an exploratory data analysis that showcased the presence of seasonality and autocorrelation in consumption volumes, as well as a statistically significant level of correlation between the volume of natural gas consumed and environmental temperature in certain categories of household consumers (categories 4, 5, 6, 7, 10, 11, 12, 14, 15, 16, 17). (Figure 2 and Figure 3; Table 1)

A slight level of correlation and autocorrelation is observed in consumer groups that use natural gas for cooking only (categories 1, 2, 8) or for cooking and/or water heating (categories 3, 9).

The second dataset consisted of 39 rows and 53 columns containing meteorological data of average monthly air temperature per month measured on each of 39 GDS in the period of January 2019 to April 2023. The dataset columns were as follows (Figure 2 and Figure 3):

Column 1—”GDS”—name of a gas distribution station;

Columns 2—53 “Temperature”—average monthly temperature measured on a gas distribution station.

Both primary datasets had been transformed into a long dataset consisting of 5,381,877 rows and five columns aiming for further model training processes.

The authenticity of our data origin caused missing values in certain months for some consumers totaling 113,014 rows, so we had to fix it for further development.

Due to our uncertainty regarding the reasons of NaN values origin—either because there was no gas consumption in particular months or the meter readings were not taken—we are to apply two approaches resulting in two different datasets the models to be trained on.

A simpler approach involves replacing the “NaN” values with “0” values. Given the small number of such values (about 2%), we can assume that this will not have a critical impact on the performance of the models.

A more sophisticated approach involves replacing NaN values using Household-Specific Seasonal Imputation and K-Nearest Neighbors (KNN) techniques [29].

The Household-Specific Seasonal Imputation technique was chosen because it considers seasonality and fills missing values using the average gas consumption for the same month in other years for each consumer individually.

However, there was an issue—a certain part of a particular household values for given months in different years were missing in the original dataset. The solution was to apply a more complicated technique, the KNN, which uses patterns from similar households to estimate missing values and needs a large dataset with strong correlations. For our purpose, KNNImputer had been used with the following parameters n_neighbors = 7, weights = ‘uniform’, and metric = ‘nan_euclidean’, with fine-tuning performed by masking known values and selecting the n_neighbors that minimizes imputation error (RMSE).

Once imputation had been performed, we compared the original and imputed data to check if the imputed values follow the original distribution. The Mean and Variance Check results are presented in Table 2, confirming the imputation was successful.

As a result of data wrangling and preprocessing, we obtained two final time series datasets (NaN values replaced by 0 and imputed within special techniques, respectively) on consumption data with pronounced seasonality, statistically significant correlation to temperature data and featuring consumption category data as a supporting attribute (Figure 4). The final datasets’ columns were as follows:

Column 1 “ID”—household ID (Series Identifier);

Column 2 “Date”—month and year (Timestamp);

Column 3 “Temperature”—average monthly temperature measured on a gas distribution station the household is affiliated with (Covariate Feature);

Column 4 “Category”—household category (Attribute Feature);

Column 5 “Consumption”—Monthly volumes of consumption from January 2019 to April 2023 in cubic meters (Target Variable).

Now, they are ready to be performed in a model training process.

Three cutting age Deep Neural Network models, Seq2seqPlus, TiDE, and TFT, had been trained on the final datasets with following parameters:

Chronological data splitting was applied; the earliest 80% of rows were assigned to training, the next 10% to validation, and the most recent 10% to test.

Forecast horizon was set to 12 months pursuing to maintain a year-ahead forecasting and the context window that defines the input lags to the model for each time series was set to 24.

Data granularity—monthly

The RMSE metric was set as an optimization objective.

The compute engine parameters used for model training are as follows: GPU Type—NVIDIA L4 (4 vCPU, 2 Core, 16 GB), Number of GPUs—4, Data disk size—100 GB. The computational facilities were limited to 2 node hours and the number of training epochs to 10 per model.

After the training was performed and models were tested, we obtained the following results (Table 3).

Model feature attribution is expressed using the SHAP methodology [30].

SHAP (SHapley Additive exPlanations) is a powerful method for interpreting machine learning models, including deep neural networks, by quantifying the contribution of each feature to the model’s predictions. SHAP is based on Shapley values from cooperative game theory, which fairly distribute the prediction outcome among the input features. It explains how much each feature contributes (positively or negatively) to a given prediction.

The feature attribution of trained models is presented in Figure 4.

Model feature attribution tells us how important each feature is when making a prediction. Attribution values are expressed as a percentage; the higher the percentage, the more strongly that feature impacts a prediction on average.

5. Discussion

Considering the results of our research, the following conclusions can be drawn.

A global modeling approach had been adopted in this study to leverage common seasonal, temperature-dependent, and structural consumption patterns observed across a large population of households. This decision was driven by the practical need for scalable, centralized forecasting models applicable to regulatory planning across regions. At the same time, we acknowledge the heterogeneity among consumer behaviors—especially across usage categories and geographic areas—and addressed this in part by incorporating category and temperature as model features. While global models efficiently generalize shared patterns, further research may explore cluster-based segmentation or hybrid approaches to better capture subgroup-specific dynamics without sacrificing scalability.

Past consumption is a dominant feature of all trained models, which may confirm their autoregressive origins and high importance of historical consumption data in our context. As well these demonstrate strong seasonal and temporal patterns of household consumption.

Meanwhile, temperature and category features are to be considered as supportive due to their moderate influence level—they provide useful fine-tuning rather than the primary signal.

This confirms the fact that our models are specifically designed to handle both autoregressive dependencies (past gas consumption) and exogenous factors (temperature, category, etc.), as well as their ability to learn temperature effects indirectly through past gas consumption.

Anyway, the importance of temperature feature was shown previously in our recent paper using the Granger Causality test, which means our models should benefit from including it explicitly [13].

The consumption category feature is valuable as well because it helps differentiate between groups with strong and weak seasonal behavior. It prevents seasonality bias, ensuring that groups with non-seasonal behavior are still predicted accurately, and interacts with temperature, helping the model decide when temperature matters more or less.

Without this feature, our models would treat all consumers the same, potentially leading to overgeneralization when the model may only learn the dominant pattern (e.g., strong seasonality) and ignore less seasonal groups, or loss of detail when consumers with flat or irregular consumption patterns might not be predicted well.

There are several general trends we observed regarding models trained on “Totally Imputed” vs. “Zero filled” datasets due to the performance metrics obtained:

-: Totally Imputed data consistently improves performance across all models compared to Zero filled data.
-: R² drops slightly with Zero filled, meaning the models explain less variance.
-: Seq2Seq suffers the most from missing values, while TiDE is the most stable.

All models show approximately the same levels of accuracy, especially on processed and imputed data, which may indicate the acceptability of all described DNN techniques for achieving our goals using real life data.

When finally choosing a model for use at the deployment stage, one should consider such forecasting parameters as completeness and robustness of the data and forecasting horizon. Of course, further periodic monitoring of accuracy levels and retraining on new data is recommended to maintain the functional parameters of the models.

Model Insights and Suitability. The Seq2SeqPlus model with an attention mechanism demonstrated a strong performance in terms of MAE and RMSE, particularly on the imputed dataset. It effectively captured temporal dependencies and historical consumption trends, leading to reliable short-term forecasts.

However, its MAPE was comparatively high, suggesting that the model was less effective at capturing low-consumption fluctuations, a common issue when prioritizing absolute over percentage-based errors.

While Seq2SeqPlus may require additional feature engineering to generalize across consumption categories with non-seasonal behaviors, it remains a strong candidate for short-term aggregate forecasting, particularly when the input data is complete and well-processed.

TiDE emerged as the most balanced model across all evaluation metrics. It showed robust performance on both datasets, with minimal degradation in the presence of missing data, which points to its stability under real-world conditions. Although it did not outperform the other models in any single metric, its consistency and relatively low variance make it a reliable choice for general-purpose forecasting.

TiDE’s simplicity also offers faster training times and easier hyperparameter tuning, which are advantageous in operational environments that require frequent model retraining.

TFT model offered the greatest flexibility and interpretability, excelling in handling diverse feature types (static, known, and observed time-varying variables).

While slightly underperforming Seq2SeqPlus on some absolute metrics (MAE, RMSE), its ability to model complex temporal relationships and provide explanatory insight via attention mechanisms and SHAP analysis makes it valuable for long-term forecasting and strategic planning applications.

However, TFT’s complexity and sensitivity to imputation strategies make it more challenging to deploy in volatile or sparse data environments without additional regularization and tuning.

Based on the comparative analysis and the specific forecasting objectives the models can be recommended for different use cases (Table 4):

For real-world deployment, it is advisable to consider MAE or WAPE as primary optimization objectives, as these metrics align more closely with minimizing total consumption error—critical for operational planning. Furthermore, an ensemble approach combining the strengths of Seq2SeqPlus and TiDE may offer a balance between responsiveness and robustness.

The influence of data preprocessing—particularly the treatment of missing values—proved to be a critical factor in model performance. All three models consistently showed better accuracy on the imputed dataset compared to the version where missing values were simply replaced with zeros. The difference was most apparent in the MAE and RMSE values, which substantially degraded on the NaN-filled data.

This suggests that imputation strategies should be treated as an integral part of model development, not just a preprocessing step.

Given the high seasonality observed in some consumer categories, imputing based on historical seasonal trends and similar profile behavior appears to preserve important patterns that the models can effectively learn from.

For large-scale utility forecasting applications, it is essential to adopt context-aware imputation methods, such as seasonal medians or clustering-based smoothing, to maintain model accuracy while minimizing information loss.

6. Conclusions

The application of deep neural network (DNN) models to gas consumption forecasting represents a significant advancement in aligning predictive analytics with the operational and regulatory needs of modern energy systems. Unlike traditional statistical approaches, DNN-based models—such as Seq2SeqPlus, TiDE, and TFT—offer the flexibility to incorporate complex temporal patterns, consumer heterogeneity, and external variables such as weather or calendar effects.

In regulated energy markets, where forecasting accuracy directly impacts supply planning, tariff setting, and network balancing obligations, the ability of these models to deliver reliable, high-resolution forecasts at scale is particularly valuable.

Moreover, explainable architectures like TFT further enhance transparency, supporting compliance with regulatory requirements for model interpretability and auditability.

Equally important, these models can be tuned to prioritize total consumption accuracy, which aligns well with regulatory metrics such as aggregate demand forecasting accuracy, under-/over-supply penalties, and carbon accounting targets.

As regulators increasingly demand real-time responsiveness, robust handling of incomplete data, and forward-looking risk modeling, DNN-based forecasting systems are poised to become a cornerstone of compliant, data-driven energy operations.

Ultimately, the successful deployment of these models requires more than technical precision—it demands careful integration with data governance, regulatory frameworks, and operational workflows. When implemented thoughtfully, DNN forecasting tools can significantly enhance both regulatory compliance and strategic energy planning in the gas sector.

Author Contributions

Conceptualization, K.P., O.P., T.W., S.S., A.T., and T.V.; methodology, K.P., O.P., T.W., S.S., A.T., and T.V.; analysis and selection of sources and the literature, K.P., O.P., T.W., S.S., A.T., and T.V.; consultations on material and technical issues, K.P., O.P., T.W., S.S., A.T., and T.V.; literature review, K.P., O.P., T.W., S.S., A.T., and T.V.; writing—original draft K.P., O.P., T.W., S.S., A.T., and T.V.; writing—review and editing, K.P., O.P., T.W., S.S., A.T., and T.V.; supervision, K.P. and O.P.; funding acquisition, T.W. All authors have read and agreed to the published version of the manuscript.

Funding

His work was supported by a subsidy from the Ministry of Education and Science for the WSEI (Project No. No 7/120/225upf).

Data Availability Statement

All data used in this study are publicly available, all datasets are in the public domain and were available as of 1 May 2025. The results of the author’s calculations are the author’s own and are reliable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sala, D.; Pavlov, K.; Pavlova, O.; Demchuk, A.; Matiichuk, L.; Cichoń, D. Determining of the Bankrupt Contingency as the Level Estimation Method of Western Ukraine Gas Distribution Enterprises’ Competence Capacity. Energies 2023, 16, 1642. [Google Scholar] [CrossRef]
Pavlov, K.; Pavlova, O.; Korotia, M.; Horal, L.; Ratushniak, I.; Semenov, M.; Ratushniak, L.; Shapovalov, Y.; Anastasenko, S.; Hryhoruk, I.; et al. Determination and Management of Gas Distribution Companies’ Competitive Positions. In Advances in Manufacturing, Production Management and Process Control–AHFE 2020; Mrugalska, B., Trzcielinski, S., Karwowski, W., Di Nicolantonio, M., Rossi, E., Eds.; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2020; Volume 1216. [Google Scholar] [CrossRef]
Anagnostis, A.; Papageorgiou, E.; Bochtis, D. Application of Artificial Neural Networks for Natural Gas Consumption Forecasting. Sustainability 2020, 12, 6409. [Google Scholar] [CrossRef]
Almazrouee, A.I.; Almeshal, A.M.; Almutairi, A.S.; Alenezi, M.R.; Alhajeri, S.N. Long-Term Forecasting of Electrical Loads in Kuwait Using Prophet and Holt–Winters Models. Appl. Sci. 2020, 10, 5627. [Google Scholar] [CrossRef]
Svoboda, R.; Sebastián Basterrech, S.; Kozal, J.; Platoš, J.; Michał Woźniak, M. A natural gas consumption forecasting system for continual learning scenarios based on Hoeffding trees with change point detection mechanism. Knowl.-Based Syst. 2024, 304, 112482. [Google Scholar] [CrossRef]
Tian, N.; Shao, B.; Bian, G.; Zeng, H.; Li, X.; Zhao, W. Application of forecasting strategies and techniques to natural gas consumption: A comprehensive review and comparative study. Eng. Appl. Artif. Intell. 2024, 129, 107644. [Google Scholar] [CrossRef]
Fan, L.; Su, H.; Zio, E.; Chi, L.; Zhang, L.; Zhou, J.; Liu, Z.; Zhang, J. A deep reinforcement learning-based method for predictive management of demand response in natural gas pipeline networks. J. Clean. Prod. 2022, 335, 130274. [Google Scholar] [CrossRef]
Official Web Portal of the Parliament of Ukraine. On Approval of the Methodology for Determining and Calculating the Tariff for Natural Gas Distribution Services. 2016. Available online: https://zakon.rada.gov.ua/laws/show/z1434-16#Text (accessed on 1 June 2025).
Official Web Portal of the Parliament of Ukraine. On Approval of the Methodology for Determining and Calculating Tariffs for Natural Gas Transmission Services for Entry and Exit Points Based on Multi-Year Incentive Regulation. 2015. Available online: https://zakon.rada.gov.ua/laws/show/z1388-15#Text (accessed on 1 June 2025).
Official Web Portal of the Parliament of Ukraine. On Approval of the Gas Transmission System Code. 2015. Available online: https://zakon.rada.gov.ua/laws/show/z1378-15#Text (accessed on 1 June 2025).
Informational Platform (IP). Gas Transmission System Operator of Ukraine. 2025. Available online: https://tsoua.com/en/business-services/how-to-become-a-customer/information-platform/ (accessed on 18 June 2025).
National Commission for State Regulation of Energy and Public Utilities. Charges in the Presence of an Individual Meter. n.d. Available online: https://www.nerc.gov.ua/sferi-diyalnosti/prirodnij-gaz/pobutovi-spozhivachi/narahuvannya-pri-nayavnosti-individualnogo-lichilnika (accessed on 18 June 2025).
Sala, D.; Pavlov, K.; Bashynska, I.; Pavlova, O.; Tymchyshak, A.; Slobodian, S. Analyzing Regulatory Impacts on Household Natural Gas Consumption: The Case of the Western Region of Ukraine. Appl. Sci. 2024, 14, 6728. [Google Scholar] [CrossRef]
Zhang, Z. Prediction of Economic Operation Index Based on Support Vector Machine. Mob. Inf. Syst. 2022, 2022, 3232271. [Google Scholar] [CrossRef]
Dabrowski, J.J.; Zhang, Y.; Rahman, A. ForecastNet: A Time-Variant Deep Feed-Forward Neural Network Architecture for Multi-step-Ahead Time-Series Forecasting. In Neural Information Processing—ICONIP 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12534, pp. 579–591. [Google Scholar] [CrossRef]
Das, A.; Kong, W.; Leach, A.; Mathur, S.; Sen, R.; Yu, R. Long-term Forecasting with TiDE: Time-series Dense Encoder. arXiv 2023, arXiv:2304.08424. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? arXiv 2022, arXiv:2205.13504. [Google Scholar] [CrossRef]
Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv 2015, arXiv:1511.07289. [Google Scholar]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2016, arXiv:1409.0473. [Google Scholar]
Luong, M.-T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2025, arXiv:1409.3215. [Google Scholar]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd ed.; O’Reilly Media: Sebastopol, CA, USA, 2022. [Google Scholar]
Geary, R.C. Relative Efficiency of Count Sign Changes for Assessing Residual Autoregression in Least Squares Regression. Biometrika 1970, 57, 123–127. [Google Scholar] [CrossRef]
Masol, V.I.; Slobodian, S.Y. Joint distribution of some events in the Bernoulli scheme with parameters (n,p). Cybern. Syst. Anal. 2025, 61, 461–468. [Google Scholar] [CrossRef]
Energy Map. Household Consumer Groups | Energy Map. 2015. Available online: https://map.ua-energy.org/uk (accessed on 1 June 2025).
Scikit-Learn. 1.6. Nearest Neighbors. 2025. Available online: https://scikit-learn.org/stable/modules/neighbors.html (accessed on 1 June 2025).
Divyani, S. Explainable Deep Learning for Time Series Analysis: Integrating SHAP and LIME in LSTM-Based Models. J. Inf. Syst. Eng. Manag. 2025, 10, 412–423. [Google Scholar] [CrossRef]

Figure 1. Gas market data flow scheme.

Figure 2. Seasonality of gas consumption.

Figure 3. Autocorrelation of gas consumption.

Figure 4. Model feature attribution of trained models.

Table 1. Correlation between consumption and environmental temperature.

Category	Correlation	p Value	t Value
1	−0.1077196787164641	0	−143.14615241829136
2	−0.1029918495127673	1.6738761249561719 × 10⁻¹⁹⁵	−29.90818397839599
3	−0.0898838682933454	0	−71.65630977055373
4	−0.6111798276700345	1.901215406512052 × 10⁻⁶	−5.405310539643045
5	−0.5621617874528735	2.1546658143801656 × 10⁻¹⁸	−9.660890728796526
6	−0.5763568492042973	0	−410.6086271959947
7	−0.5517534525364725	0	−233.5648189798738
8	−0.1335519359322037	0	−39.527176227944956
9	−0.13760491992730364	4.822132033816754 × 10⁻²⁷	−10.82111701876933
10	−0.6684467451396608	1.636614697340057 × 10⁻¹⁴	−8.987398331518012
11	−0.5852469722518012	3.836439139880669 × 10⁻²⁰	−10.258210616748146
12	−0.706282363260936	7.886147076541504 × 10⁻⁴⁰	−15.868948788773727
14	−0.5898449444053748	0	−391.57682330968424
15	−0.6226004374063476	0	−995.114255839583
16	−0.5921695995263664	0	−514.3867148669385
17	−0.508729170725958	0	−85.0251952906726

Table 2. Statistics of Replaced by 0 and Imputed datasets.

Dataset	Replaced by 0	Totally Imputed
count	5.381877 × 10⁶	5.381877 × 10⁶
mean	7.925611 × 10¹	8.081569 × 10¹
std	1.251031 × 10²	1.254847 × 10²
min	0.000000 × 10⁰	1.000000 × 10⁻²
25%	8.950000 × 10⁰	9.780000 × 10⁰
50%	2.231000 × 10¹	2.360000 × 10¹
75%	9.947000 × 10¹	1.006500 × 10²
max	5.506240 × 10³	5.506240 × 10³
var	15,650.793002383958	15,746.508345587017

Table 3. Performance metrics of trained models.

Model	Dataset	Training Time, min	MAE	RMSE	R²	WAPE
Seq2seq + attention	Imputed	109	28.721	61.134	0.804	25.194
Seq2seq + attention	NaN	128	59.075	101.258	0.772	52.293
TiDE	Imputed	116	35.356	71.154	0.801	31.013
TiDE	NaN	115	36.62	72.37	0.789	31.605
TFT	Imputed	102	30.657	64.31	0.791	26.891
TFT	NaN	105	33.511	66.202	0.776	28.921

Table 4. Predictive models that can be used.

Use Case	Recommended Model	Rationale
Short-term forecasting	Seq2SeqPlus	High accuracy in total volume, strong short-horizon memory.
General-purpose deployment	TiDE	Stable, low-variance predictions; robust to data imperfections.
Long-term forecasting + explainability	TFT	Captures long-term patterns; allows interpretability via attention.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pavlov, K.; Pavlova, O.; Wołowiec, T.; Slobodian, S.; Tymchyshak, A.; Vlasenko, T. Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints. Energies 2025, 18, 3690. https://doi.org/10.3390/en18143690

AMA Style

Pavlov K, Pavlova O, Wołowiec T, Slobodian S, Tymchyshak A, Vlasenko T. Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints. Energies. 2025; 18(14):3690. https://doi.org/10.3390/en18143690

Chicago/Turabian Style

Pavlov, Kostiantyn, Olena Pavlova, Tomasz Wołowiec, Svitlana Slobodian, Andriy Tymchyshak, and Tetiana Vlasenko. 2025. "Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints" Energies 18, no. 14: 3690. https://doi.org/10.3390/en18143690

APA Style

Pavlov, K., Pavlova, O., Wołowiec, T., Slobodian, S., Tymchyshak, A., & Vlasenko, T. (2025). Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints. Energies, 18(14), 3690. https://doi.org/10.3390/en18143690

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Gas Demand Prediction Using Deep Neural Networks: A Data-Driven Approach to Forecasting Under Regulatory Constraints

Abstract

1. Introduction

2. Actual Regulatory Use Cases of Consumption Forecasting

3. Materials and Methods

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI