A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands

Kim, Jahun; Lee, Sangjun; Park, In-Beom; Kim, Kwanho

doi:10.3390/en18225963

Open AccessArticle

A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands

by

Jahun Kim

,

Sangjun Lee

,

In-Beom Park

^*

and

Kwanho Kim

^*

Department of Industrial and Systems Engineering, Dongguk University, Seoul 04620, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Energies 2025, 18(22), 5963; https://doi.org/10.3390/en18225963 (registering DOI)

Submission received: 15 October 2025 / Revised: 10 November 2025 / Accepted: 11 November 2025 / Published: 13 November 2025

(This article belongs to the Section G: Energy and Buildings)

Download

Browse Figures

Versions Notes

Abstract

Efficient operational planning in district heating systems (DHSs) is essential for minimizing operating costs and maximizing energy efficiency. However, since practitioners must determine future production plans under unknown future demands and costs in real-world energy systems, it is challenging to solve the production planning problems of DHSs. In this paper, we propose a multi-head attention-based TimesNet (MATN) in which a transformer decoder is incorporated that operates solely on a 24 h lookback window without requiring any future information. Specifically, the model is trained in an end-to-end manner, for which the training dataset was built by solving a mixed integer programming (MIP) model. Experimental results demonstrate that the proposed MATN model significantly outperforms baseline deep learning-based methods. A qualitative analysis of the hourly production plans further indicates that MATN generates robust operational plans that mimic those generated by an MIP model, which suggests the effectiveness of the proposed approach in terms of economic efficiency and operational stability without depending on future information.

Keywords:

district heating system; heat production planning; multi-head attention; deep neural networks; unknown future demands

1. Introduction

A district heating system (DHS) is critical infrastructure for efficiently delivering thermal energy to urban areas [1,2]. The core operational task of a DHS is the heat production planning problem, which is responsible for determining the production levels of heating plants in a real-time manner to satisfy fluctuating demand while minimizing operational costs [3,4]. Unfortunately, it is challenging to effectively solve such problems, since operators must leverage a trade-off between generating heat, utilizing stored thermal energy, and transacting with external networks [5,6,7]. The economic incentives for these choices are driven by volatile energy prices, while the feasibility of any plan is constrained by the physical limitations of the production units, such as minimum uptime and downtime requirements, which create strong dependencies between time periods in the process of making decisions [8,9,10].

This planning complexity, established by temporal dependencies, is further amplified by operational uncertainty. Real world demand and market prices, driven by factors like weather and unplanned events, often deviate from forecasts. These discrepancies can render a pre-calculated optimal plan inefficient or even unfeasible when confronted with actual conditions, leading to avoidable costs and stability risks. The combination of strong temporal constraints and input uncertainty makes optimization plans sensitive to forecast errors, where deviations from predicted conditions can cascade through temporal dependencies and degrade plan quality [11,12,13,14,15,16].

This study focuses on determining heat production plans given unknown future information. Specifically, the production plans are obtained by utilizing the demand and production cost information together with a 24 h history window. This design aims to enable immediate replanning when new demands and costs are observed [17,18]. Mixed-integer programming (MIP) is a well-known approach to developing production plans, prized for its ability to deliver optimal solutions [19]. Its primary drawback, however, is a significant computational intensity that hinders rapid replanning in dynamic scenarios [20,21].

To overcome this limitation, prior work has employed deep learning-based approaches that were trained on the outputs of MIP [22,23]. These approaches successfully demonstrated that a trained model could generate multi-period production plans while requiring a short inference time. Yet, a drawback of these approaches is their dependence on complete, day-ahead forecasts for training. These dependencies on deterministic inputs make the models highly sensitive to the inevitable errors of real-world forecasting [24,25], which might lead to a significant performance gap between simulation and deployment [26].

Motivated by the above challenges, we propose a multi-head attention-based TimesNet (MATN) that aims to build heat production plans under the dynamic and uncertain conditions of real-world DHSs. The architecture of MATN is inspired by recent advances in time-series analysis and sequence-to-sequence modeling, integrating a TimesNet encoder for periodic pattern extraction with a Transformer-based decoder [27,28,29,30]. Specifically, MATN is trained on production plans generated by an MIP model [22]. The trained network operates by first encoding multi-period forecasts of demand and cost into a rich temporal embedding, and then the decoder autoregressively queries this embedding at each decision step to generate a feasible and coherent forward-looking plan. As a result, MATN quickly produces cost-effective schedules, bridging the gap between offline optimization and the demands of practical, online deployment [31].

The remainder of this paper is structured as follows. Section 2 reviews prior work in DHS optimization and relevant deep learning methodologies. Section 3 formally defines the heat production planning problem and presents the MIP formulation used to generate training data. Section 4 details the proposed MATN architecture and the cost-aware training framework. Section 5 presents a comprehensive experimental evaluation of our model against baseline methods, analyzing its performance in terms of accuracy, economic efficiency, and operational stability. Finally, Section 6 concludes the paper with a summary of our findings and a discussion of future research directions.

2. Literature Review

Classical optimization formulates district heating operation planning with MIP, which minimizes operating costs under realistic plant and storage constraints [11,31,32]. These models capture production limits, storage bounds, and unit commitment rules, including minimum uptime and minimum downtime, typically through mixed integer linear programming and in some cases mixed integer nonlinear programming, when efficiency or investment effects are represented [13,33,34,35,36,37]. These formulations offer clear optimality guarantees and represent complex feasibility sets with high fidelity. However the computational burden rises steeply with longer horizons and richer constraints, which limits their ability to replan quickly as conditions change in real time [20,38,39,40]. This limitation motivates approximations that deliver near-optimal plans at low latency while respecting engineering constraints.

Data driven forecasting improves the inputs to optimization by predicting heat demand and related variables from historical observations [16,17,41,42]. Early studies employed support vector machines and multilayer perceptrons, and subsequent work adopted recurrent models such as long short-term memory and gated recurrent units to capture temporal dependence among demand, temperature, and other covariates [27,43,44,45]. More recent time series architectures, including TimesNet, learn multi-periodic structures that often appear in the daily and weekly cycles of district heating data [29]. These advances increase predictive accuracy, yet most methods assume access to future exogenous signals such as weather forecasts, and thus still hand predictions to a separate optimizer [46]. Under information constraints where only past data are available at decision time, forecasting alone is insufficient for end-to-end planning [47,48].

Deep learning for planning seeks to learn policies that map observed histories to feasible schedules with low costs [49,50,51,52]. Supervised approaches train on optimal schedules generated offline by an MIP model but frequently assume look-ahead inputs over the next twenty-four hours and, therefore, diverge from real operations when future values are unknown [23]. Reinforcement learning has been explored for building and energy control, because it can learn long-horizon behavior from interaction, yet it requires a trusted simulator, stable training, and careful safety assurance before deployment in thermal networks [53,54,55,56]. These limitations define a clear gap. The present work addresses this gap by developing a real time planning policy from only past observations, using MATN to encode a multi-periodic temporal structure and cost-aware objective to internalize economic goals and switching stability [57]. This design retains the strengths of classical formulations by learning from optimal solutions while eliminating the need for future information at inference time [58,59].

3. Problem Description

The primary objective when operating a DHS is to satisfy time-varying heat demand reliably while minimizing total operational cost. For each time step, an operator must determine the optimal production level of the heat plant. This decision is made within a system comprising the production units, a thermal energy storage facility, and an external heat network, as illustrated in Figure 1.

This configuration provides crucial operational flexibility. Surplus heat can be stored or sold to the external network, while deficits can be met by discharging the storage or purchasing from the network. Consequently, an optimal production plan transcends simple demand-following, requiring a sophisticated strategy to leverage these components in the most economical way based on factors like production prices and the storage’s state of charge.

This complex, sequential decision-making problem is formally modeled as a MIP problem. Under the ideal assumption that all future information over the planning horizon is known, this model provides a theoretical optimal solution. In this study, this MIP formulation serves the two critical roles of generating expert-level training data and establishing a benchmark for evaluating the performance of our proposed MATN model.

Given the notation defined in Table 1, we adopt the MIP formulation as in a previous study [23] to generate production plans for training datasets. The formulation of the MIP model used in this study is based on [23]. This MIP model serves as our optimization-based benchmark, providing solutions for training and evaluation, which is defined as follows.

Minimize \sum_{t = 1}^{T} H_{t} \cdot x_{t} + \sum_{t = 1}^{T} C_{t} \cdot q_{t} + \sum_{t = 1}^{T} {E F}_{t} \cdot s_{t} + \sum_{t = 1}^{T} {E T}_{t} \cdot r_{t}

(1)

Subject to q_{t - 1} + x_{t} + r_{t} = q_{t} + s_{t} + D_{t}, t = 1, \dots, T

(2)

x_{t} \leq P_{m a x} \cdot y_{t}, t = 1, \dots, T

(3)

x_{t} \geq P_{m i n} \cdot y_{t}, t = 1, \dots, T

(4)

x_{t} = \sum_{i = 1}^{K} F_{i} \cdot u_{i, t}, t = 1, \dots, T

(5)

\sum_{i = 1}^{K} u_{i, t} \leq 1, t = 1, \dots, T

(6)

y_{t} - y_{t - 1} = z_{t} - w_{t}, t = 1, \dots, T

(7)

y_{t + a} \geq z_{t}, t = 1, \dots, T, 0 \leq a < O T

(8)

y_{t + b} {\leq (1 - w}_{t}), t = 1, \dots, T, 0 \leq b < O T

(9)

q_{t} \leq {P C}_{m a x}, t = 1, \dots, T

(10)

q_{t} \geq {P C}_{m i n}, t = 1, \dots, T

(11)

y_{t}, z_{t}, w_{t} \in {0,1}, t = 1, \dots, T

(12)

u_{i, t} \in {0,1}, t = 1, \dots, T, i = 1, \dots, K

(13)

x_{t}, q_{t}, r_{t}, s_{t} \in R^{+}, t = 1, \dots, T

(14)

Equation (1) minimizes the total operating cost over the planning horizon by aggregating production costs

H_{t} {\cdot x}_{t}

, inventory-holding costs

C_{t} \cdot q_{t}

, and external purchase costs

{E F}_{t} \cdot s_{t}

, offset by revenue from external sales

{E T}_{t} \cdot r_{t}

. The intertemporal energy-balance constraint (2) links consecutive periods by ensuring that heat inventory consistently reflects production, demand, storage inflow/outflow, and external exchanges. Constraints (3)–(4) tie feasible output to the unit’s on/off status and enforce the minimum and maximum production capacity. Constraints (5)–(6) encode the discrete operating regime via a mutually exclusive (one-hot) level selection. Constraints (7)–(9) impose operational logic such as minimum up/downtime. Constraints (10)–(11) enforce the storage’s lower and upper bounds. Finally, Constraints (12)–(14) specify variable domains, yielding a standard MIP that can be solved with off-the-shelf optimizers.

4. Proposed Method

4.1. Overall Framework

The overall methodology follows a framework of two stages as presented in Figure 2. The primary objective of the first stage is to prepare a comprehensive and high-fidelity dataset to train the MATN to learn optimal decision patterns. Given that historical operating records often lack the breadth to cover sufficient situations for robust learning, we leverage a MIP model, as defined in Section 3, to compute theoretically optimal plans for diverse simulated scenarios, which then function as the ground truth for training. The methodology for generating these optimal plans follows the approach detailed in a prior work [23]. As depicted in the training dataset in Figure 2, each training instance maps an input structure to its corresponding optimal action. This structure combines the previous system state, namely the heat inventory and production levels, with a 24 h historical window of time-series data like demand and cost. Utilizing this dataset, the MATN iteratively updates its parameters to minimize the discrepancy between its predictions and these optimal solutions.

In the second planning stage, the trained model produces plans in real time that mimic actual operations. Unlike the MIP model, it does not rely on any forecasts of future demand or cost. Reflecting the data structure shown in the diagram, MATN receives only the available past time series combined with the immediately preceding system state as input and outputs the production level for the current time step. This decision is then used to update the next system state, and the procedure repeats across the planning horizon in an autoregressive rollout, yielding a complete operating schedule.

4.2. Data Configuration and Inference Method

Because the study targets a realistic setting without future information, the model inputs and outputs are restructured as in Table 2. At time t, the input consists of the preceding 24 h time series together with the immediately preceding system state. Specifically, it includes heat demand

D_{t}

from

(D_{t - 23}, \dots, D_{t}),

unit production price

H_{t}

from

{(H}_{t - 23}, \dots, H_{t})

, and the previous inventory

q_{t - 1}

and previous production

x_{t - 1}

. The scalar states

q_{t - 1}

and

x_{t - 1}

are broadcast across the 24 time steps to match the temporal dimension of the time series. All inputs are min–max normalized to the range of zero to one.

The model output

u_{i, t}

is defined as a classification over four discrete production levels to be executed at time t.

During deployment, the trained model operates in an autoregressive manner. The procedure follows the inference rules in constraints (15) through (21). At time t equals zero, the model takes the actual past data as input and predicts the first production level

u_{i, t}

. This process is repeated over the 120 h planning horizon to produce the full operating schedule. This inference methodology is adopted from the prior work [23].

{\hat{x}}_{t} = \sum_{i = 1}^{K} F_{i} \cdot {\hat{u}}_{i, t}

(15)

{\hat{y}}_{t} = \{\begin{matrix} 1 i f {\hat{x}}_{t} > 0 \\ 0, o t h e r w i s e \end{matrix}

(16)

{\hat{z}}_{t} = \{\begin{matrix} 1, i f {\hat{y}}_{t} = 1 a n d {\hat{y}}_{t - 1} = 0 \\ 0, o t h e r w i s e \end{matrix}

(17)

{\hat{w}}_{t} = \{\begin{matrix} 1, i f {\hat{y}}_{t} = 0 a n d {\hat{y}}_{t - 1} = 1 \\ 0, o t h e r w i s e \end{matrix}

(18)

{\hat{q}}_{t} = \{\begin{matrix} i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} \geq {P C}_{m a x} \\ i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} > {P C}_{m a x} \\ i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} \leq {P C}_{m a x} \end{matrix} a n d {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} < {P C}_{m a x}

(19)

{\hat{r}}_{t} = \{\begin{array}{l} {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} - {P C}_{m a x} i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} \geq {P C}_{m a x} \\ 0 i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} < {P C}_{m a x} \end{array}

(20)

{\hat{s}}_{t} = \{\begin{array}{l} {P C}_{m i n} - ({\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t}) i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} \leq {P C}_{m i n} \\ 0 i f {\hat{q}}_{t - 1} + D_{t} - {\hat{x}}_{t} > {P C}_{m i n} \end{array}

(21)

These equations define the state-update rules used during the autoregressive rollout. Constraint (15) translates the model’s categorical output

u_{i, t}

into a physical production quantity

x_{t}

. Constraints (16) through (18) then determine the plant’s operational status, such as its on/off state

y_{t}

, and track startup

z_{t}

and shutdown

w_{t}

events. Finally, constraints (19) through (21) manage the energy balance. Constraint (19) updates the heat inventory

q_{t}

while ensuring it remains within the storage capacity limits

{P C}_{m i n}, P C_{m a x}

. Any excess heat beyond the maximum capacity is sold to the external network

r_{t}

as defined in constraint (20), while any shortfall below the minimum capacity triggers a purchase from the network

s_{t}

as defined in constraint (21).

4.3. Proposed MATN-Based Heat Production Planning Model

This study designed MATN to analyze complex time series patterns and to make sequential decisions. MATN captures multi-periodic structure and temporal context, providing a solution tailored to the DHS planning problem. The overall structure is shown in Figure 3.

This study introduces an architecture specifically designed to analyze complex multi-periodic time series patterns and to make robust sequential decisions in the context of DHS planning, called MATN. MATN integrates advanced temporal feature extraction with a powerful sequence-to-sequence learning paradigm providing a tailored solution to the intricate challenges of DHS operation. The overall structure of the MATN model is conceptually illustrated in Figure 3. The MATN architecture processes an input time series over a lookback window to predict a forecast target.

The MATN encoder branch is meticulously designed to extract rich multi-periodic patterns from the lookback window. It begins by applying a fast Fourier transform (FFT) to compute a frequency spectrum. For each selected dominant period the time series data are dynamically reshaped into a two-dimensional representation. These reshaped features are processed by a parameter-efficient inception block [44]. The outputs from these period-specific processing streams are then combined, passed through a softmax layer, and subsequently projected with positional encoding [57]. This process creates a compressed memory sequence which effectively summarizes the critical temporal context and multi-periodic features of the input for the subsequent decoder.

The MATN decoder branch is responsible for sequentially generating the optimal production plan by leveraging the rich temporal context provided by the encoder’s memory sequence. As shown in Figure 3, the decoder first employs a masked self-attention mechanism to ensure its predictions are internally coherent without referencing future information. Subsequently, a multi-head attention mechanism queries the encoder’s memory, allowing the model to focus on the historical periodic patterns most relevant to the current decision-making step [57]. The output from the attention layers is then processed through a feed-forward network and, finally, a classifier head outputs a probability distribution over the discrete production levels for the target time step.

4.4. Cost-Aware Loss Function

To effectively train the MATN model and align its decisions with real world operational objectives, this study employs a multi objective cost-aware loss function. This function guides the model beyond simple imitation learning by integrating explicit penalties for operational inefficiencies and unstable production schedules. The total loss for training MATN as presented in Equation (25) is a weighted sum of three distinct components.

L_{C E} = - \frac{1}{T} \sum_{t = 1}^{T} \sum_{i = 1}^{K} α_{i} u_{i, t} \log p_{i, t}

(22)

L_{c o s t} = \frac{1}{T} \sum_{t = 1}^{T} (H_{t} {\cdot \hat{x}}_{t} + C_{t} \cdot {\hat{q}}_{t} + {E F}_{t} {\cdot \hat{s}}_{t} + {E T}_{t} \cdot {\hat{r}}_{t})

(23)

L_{s w i t c h} = \frac{1}{T - 1} \sum_{t = 2}^{T} |{\hat{y}}_{t} - {\hat{y}}_{t - 1}|

(24)

L_{t o t a l} = {L_{C E} + λ}_{c o s t} \cdot L_{c o s t} + λ_{s w i t c h} \cdot L_{s w i t c h}

(25)

This training objective combines three loss terms, each targeting a specific goal. First, the classification loss, denoted as

L_{C E}

, from Equation (22) utilizes weighted cross-entropy to train the model to replicate the optimal decisions generated by the MIP. This component provides a strong foundation in imitation learning. Second, the cost term, denoted as

L_{c o s t}

, from Equation (23) incorporates a differentiable surrogate for the total operating cost. This addition makes the model directly sensitive to economic factors like production and inventory costs, aligning its decisions with the primary objective of cost minimization. Finally, the switching regularizer, denoted as

L_{s w i t c h}

, from Equation (24) penalizes frequent toggling between production states. This component promotes operational stability and generates practical schedules that avoid equipment wear from constant adjustments. The final loss, called

L_{t o t a l}

, in Equation (25) combines these three components. The coefficients, denoted as

λ_{c o s t}

and

λ_{s w i t c h}

, aims to leverage the trade-off between economic efficiency and operational stability [52,53,54,55].

5. Computational Experiments

5.1. Datasets

To evaluate the performance of the proposed MATN model, we conducted a series of experiments comparing it against a deep neural network (DNN)-based approach [23] that uses 24 h look-ahead inputs. All models were evaluated on a separate set of test scenarios. To quantify overall performance, we evaluated the generated plans based on accuracy, cost, and robustness. Accuracy is the fraction of time steps at which the production level chosen by the AI policy matches the theoretical optimum from the MIP reference under the same scenario state. Economy is the total operating cost obtained by simulating each 120 h plan in an autoregressive rollout using the calculated demand and prices. Robustness is evaluated by repeating the training with different random seeds that control initialization and data shuffling and by reporting the mean and the standard deviation across runs. When appropriate we also performed paired per-scenario significance tests to verify that improvements over the baseline were consistent.

The foundation of our dataset was the operational records from a real-world district heating system in South Korea. As a representative sample in Figure 4 shows, this historical data exhibits clear daily and weekly periodicities. However, relying solely on these raw historical records is insufficient for robust model training, as they may not cover the full range of operational conditions and edge cases to be considered by models.

To overcome this limitation and create a comprehensive dataset, we used the historical data as a basis for generating a large-scale set of diverse, simulated scenarios. This process involved augmenting the underlying patterns of the historical demand with calibrated noise and pairing these augmented profiles with various simulated prices. The MIP model, as described in Section 3, then determined the theoretically optimal production plan for each of these simulated scenarios. This generation yielded a total of 1,800,000 h of optimal operational data, partitioned into a 1,440,000 h training set and a 360,000 h validation set. A separated test set of 12,000 h, structured as 100 independent 120 h scenarios, was also constructed to represent a balanced mix of seasonal and weekly patterns. We note that production costs were calculated using an exchange rate of 1388.89 KRW per 1 USD, as of 10:44 AM UTC on 17 July 2025. All experiments were conducted on a workstation with an AMD Ryzen 7 9700X 8-Core CPU (Advanced Micro Devices, Inc., Santa Clara, CA, USA), 81.6 GB RAM, and an NVIDIA GeForce RTX 4070 SUPER (NVIDIA Corporation, Santa Clara, CA, USA), utilizing Python 3.10.9, PyTorch 2.5.1, and Gurobi Optimizer 12.0.3 (Gurobi, Inc., Beaverton, OR, USA).

5.2. Results and Analysis

A systematic hyperparameter optimization was conducted to identify the most effective MATN configuration. This process evaluated key structural parameters, including the number of decoder layers from two to four, attention heads, and the Top-k dominant periods. Table 3 presents a selection of representative results, focusing on the Layer 2 configurations which consistently showed the most promising outcomes.

The results in Table 3 highlight two distinct optimal configurations based on different performance objectives. Bold indicates the highest accuracy and the lowest cost. First, the configuration at A4 achieved the highest accuracy of 80.13%. This result was obtained using two decoder layers, four attention heads, and a Top-k value of one. Second, for operational efficiency, the configuration at A7 yielded the lowest cost of 188,707.79 USD. This setup also used two decoder layers, but with eight attention heads and a Top-k value of one.

Based on this analysis, two candidate architectures were selected for further evaluation. The configuration from index 4 was chosen as the accuracy-optimized model, while the configuration from A7 was selected as the cost-optimized model. This outcome suggests that for this application, a moderate number of attention heads is best for accuracy, whereas a higher number may be more beneficial for cost efficiency, while focusing on only the single most dominant period proved to be the most effective strategy for both.

Figure 5 presents the results of a sensitivity analysis conducted on the cost-aware loss function weights. Following the previous experiment, this analysis was performed on two selected candidate configurations: the cost-optimized configuration (a) with parameters

N_{L}

= 2,

N_{H}

= 8, and

N_{T}

= 1, and the accuracy-optimized configuration (b) with parameters

N_{L}

= 2,

N_{H}

= 4, and

N_{T}

= 1. The objective was to find the optimal values for the cost term weight and the switching penalty for each configuration. The search ranges for these weights were set based on values previously found to be effective in similar energy optimization and control tasks [52,53,54,55].

The heatmaps show that the two configurations respond differently to the weights. The cost-optimized configuration (a) exhibits a distinct optimal region, achieving the best balance of high accuracy and low operational cost at the specific coordinates of

λ_{c o s t} = 0.3

and

λ_{s w i t c h} = 0.1

. In contrast, for the accuracy-optimized configuration (b), increasing the weights generally leads to higher costs without a corresponding significant improvement in accuracy.

This analysis demonstrates that the cost-optimized architecture under

N_{L}

= 2,

N_{H}

= 8, and

N_{T}

= 1 is highly responsive to the tuning of the cost-aware loss function, finding a clear optimum that enhances both performance metrics. Given this outcome, the cost-optimized configuration with the weights

λ_{c o s t} = 0.3

and

λ_{s w i t c h} = 0.1

was selected as the final, most robust MATN model for all subsequent experiments.

As shown in Figure 6, the loss dropped quickly during the first few epochs and then flattened. Training and validation accuracy reached a stable plateau by epoch 10. This pattern shows that the model learns the most useful structure early and that longer training gives little gain in generalization, while the risk of overfitting increases. We therefore evaluated the saved checkpoints at epochs 10 to 50 using the same test protocol with autoregressive rollout and the deployment cost function. The checkpoint at epoch 10 gave the highest test accuracy and the lowest average cost among the evaluated epochs. Later checkpoints showed no improvement in accuracy and tended to increase the cost.

Table 4 shows that across all ten test sets, MATN achieved a lower total operation cost and a higher decision accuracy compared to the DNN baseline. The average cost for MATN was 187,441 USD, significantly lower than the 199,251 USD for DNN, and approached the theoretical optimum of 182,608 USD achieved by the MIP model. On average, MATN reduced operational costs by 5.93% and improved accuracy by 3.01 percentage points over DNN.

This superior performance in both cost and accuracy was consistently observed across all test scenarios, demonstrating the model’s reliability. Bold indicates the best model for each metric, the lowest cost and the highest accuracy. These results indicate that the cost-aware training of MATN effectively produces economically efficient actions without compromising predictive quality. Ultimately, this approach yields plans that are not only more accurate but also substantially more cost-effective than the baseline DNN.

Figure 7 compares the average hourly heat production and corresponding operational costs for the MIP, DNN, and MATN models. The MIP benchmark demonstrates a binary production strategy, maintaining output near zero during low-demand periods by utilizing stored heat. In stark contrast, the DNN model shows significant deviation during these same periods. Its heat production fluctuates, reaching peaks of approximately 40 units when the MIP output is zero. As a direct consequence, shown in the bottom panel, this unnecessary production incurs substantial hourly costs, peaking near 1000 USD while MIP costs remain minimal. The proposed MATN model, however, closely mirrors the operational profile of the MIP benchmark. During low-demand intervals, its production remains stable and near zero. This stability results in a cost profile that is also consistently low, closely tracking the optimal trajectory set by MIP. These results demonstrate that MATN’s production strategy directly leads to lower and less volatile operational costs compared to the baseline DNN.

Table 5 presents the results of the short-horizon autoregressive accuracy evaluation, conducted to examine near-term control fidelity. The results show a consistent performance advantage for MATN across all tested horizons. For the shortest horizon (H = 4), MATN achieves an average accuracy of 80.50%, compared to 63.00% for the DNN. This trend continues for longer horizons, with MATN recording accuracies of 82.08% (H = 12) and 82.38% (H = 24), outperforming the DNN’s 72.58% and 75.83%, respectively. The largest performance gap between the two models is observed at the shortest horizon of H = 4.

This consistent outperformance suggests that MATN is more effective at making accurate near-term predictions. Bold indicates the highest accuracy in each row. The significant accuracy margin at H = 4, in particular, indicates that the model excels at capturing the immediate, local, intraday regularities that are most critical for short-term operational decisions.

6. Conclusions

This study proposed MATN, an AI-based methodology for real-time production planning in district heating systems that overcomes the critical limitation of reliance on future forecasts. Using only historical data, we designed a hybrid architecture that integrates a TimesNet-based encoder to capture periodic patterns from time series and a Transformer-based decoder to support robust sequential decision making. Furthermore, to move beyond simple classification accuracy and induce economically sound and stable operation, we introduced a cost-aware loss function that jointly accounts for operational costs and switching stability.

The experimental results comprehensively validate the effectiveness of the proposed methodology. Empirically, MATN achieved a 5.93% reduction in total operating cost and a 3.01 percentage point increase in accuracy relative to a prior DNN baseline [23] that required future look-ahead information. Qualitative analysis of the operational profiles revealed that MATN successfully learned to generate stable, conservative plans that closely mirrored the strategic discipline of the optimal MIP benchmark, effectively avoiding the volatile and inefficient behavior of the baseline DNN. These findings demonstrate that the cost-aware training approach enables the model to internalize the complex economic logic of DHS operations, leading to both enhanced stability and superior economic efficiency.

Nevertheless, since the proposed MATN was validated only on a single DHS plant, the training data were generated from one specific MIP configuration that reflects the operation and utility of the partnered plant. As a result, the empirical results do not yet show how well the approach would generalize to more complex multi-plant or network-coupled DHSs. In multiple production units, shared thermal storage and complicated network flows will introduce additional decision variables and constraints, which makes the planning problem more challenging. In future work, extending MATN to these settings will be tested. Moreover, this study does not make use of exogenous variables such as outdoor temperature, weather forecasts, or electricity prices, even though such signals can be informative for both prediction and planning. Including these variables as additional input channels and systematically assessing their impact is an important direction for future work.

From a practical deployment perspective, MATN is intended to serve as a decision support component within existing DHS operator workflows rather than as a fully autonomous controller. In real-world setting, the model would generate cost-aware production plans and scenario-based forecasts that are presented to operators together with the associated operating costs, switching patterns, and constraint margins. This type of information can improve interpretability by making the economic and technical trade-offs behind each recommendation more transparent. At the same time, additional work is required to ensure safe operation, including the use of hard safety constraints in downstream optimization, operator override mechanisms, and logging of accepted and rejected plans so that operator feedback can be incorporated into future model updates.

Author Contributions

Conceptualization, J.K., I.-B.P. and K.K.; methodology, J.K.; software and validation, J.K. and S.L.; formal analysis and investigation, J.K., S.L. and I.-B.P.; data curation, J.K., S.L. and I.-B.P.; writing—original draft preparation, J.K., I.-B.P. and K.K.; writing—review and editing, I.-B.P. and K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2025-24523405) and supported by the Dongguk University Research Fund of 2025.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lund, H.; Werner, S.; Wiltshire, R.; Svendsen, S.; Thorsen, J.E.; Hvelplund, F.; Mathiesen, B.V. 4th Generation District Heating (4GDH): Integrating smart thermal grids into future sustainable energy systems. Energy 2014, 68, 1–11. [Google Scholar] [CrossRef]
IEA. District Heating—Energy System. Available online: https://www.iea.org/energy-system/buildings/district-heating (accessed on 15 October 2025).
Pardo-Bosch, F.; Blanco, A.; Mendoza, N.; Libreros, B.; Tejedor, B.; Pujadas, P. Sustainable deployment of energy efficient district heating: City business model. Energy Policy 2023, 181, 113701. [Google Scholar] [CrossRef]
Rušeljuk, P.; Lepiksaar, K.; Siirde, A.; Volkova, A. Economic Dispatch of CHP Units through District Heating Network’s Demand-Side Management. Energies 2021, 14, 4553. [Google Scholar] [CrossRef]
Huang, S.; Tang, W.; Wu, Q.; Li, C. Network constrained economic dispatch of integrated heat and electricity systems. Energy 2019, 179, 464–474. [Google Scholar] [CrossRef]
Siddique, M.B.; Zeng, T.; Yang, M.; Peng, S.; Gao, C.; Yang, Y. Dispatch strategies for large-scale heat pump based district heating under high renewable share and risk-aversion: A multistage stochastic optimization approach. Energy Econ. 2024, 136, 107764. [Google Scholar] [CrossRef]
Li, J.; Xue, Y.; Du, Y.; Pan, Z.; Zhang, J.; Shao, Y.; Sun, H. Coordinated economic dispatch of the primary and secondary heating systems in district heating. Front. Energy Res. 2023, 10, 1005784. [Google Scholar] [CrossRef]
Gadd, H.; Werner, S. Thermal energy storage systems for district heating and cooling. In Advances in Thermal Energy Storage Systems: Methods and Applications, 2nd ed.; Cabeza, L.F., Ed.; Woodhead Publishing/Elsevier: Oxford, UK, 2021; pp. 625–638. [Google Scholar] [CrossRef]
Sihvonen, V.; Ollila, I.; Jaanto, J.; Grönman, A.; Honkapuro, S.; Riikonen, J.; Price, A. Role of power-to-heat and thermal energy storage in district heating systems. Energy 2024, 305, 132372. [Google Scholar] [CrossRef]
Huo, S.; Wang, J.; Qin, Y.; Cui, Z. Operation optimization of district heating network under typical modes for improving the economic and flexibility performances of integrated energy system. Energy Convers. Manag. 2022, 267, 115904. [Google Scholar] [CrossRef]
Benonysson, A.; Bøhm, B.; Ravn, H.F. Operational optimization in a district heating system. Energy Convers. Manag. 1995, 36, 297–314. [Google Scholar] [CrossRef]
Franco, A.; Versace, M. Multi-objective optimization for the maximization of the operating share of cogeneration system in district heating network. Energy Convers. Manag. 2017, 139, 33–44. [Google Scholar] [CrossRef]
Dorotić, H.; Pukšec, T.; Duić, N. Multi-objective optimization of district heating and cooling systems for a one-year time horizon. Energy 2019, 169, 319–328. [Google Scholar] [CrossRef]
Rasoulian, H. Reliability, Availability and Resilience Assessment of Heating Systems. Master’s Thesis, Concordia University, Montreal, QC, Canada, 2022. Available online: https://spectrum.library.concordia.ca/id/eprint/991121/ (accessed on 15 October 2025).
Rafati, A.; Tahavori, M.; Shaker, H.R. Data-driven reliability analysis of district heating systems for asset management applications: A review. Sustain. Cities Soc. 2025, 118, 106052. [Google Scholar] [CrossRef]
Protić, M.; Shamshirband, S.; Petković, D.; Abbasi, A.; Kiah, M.L.M.; Unar, J.A.; Živković, L.; Raos, M. Forecasting of consumers heat load in district heating systems using the support vector machine with a discrete wavelet transform algorithm. Energy 2015, 87, 343–351. [Google Scholar] [CrossRef]
Fang, T.; Lahdelma, R. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system. Appl. Energy 2016, 179, 544–552. [Google Scholar] [CrossRef]
Reda, F.; Ruggiero, S.; Auvinen, K.; Temmes, A. Towards low-carbon district heating: Investigating the socio-technical challenges and opportunities. Smart Energy 2021, 4, 100054. [Google Scholar] [CrossRef]
Danfoss. The Importance of System Boundaries When Evaluating the Energy Efficiency of District Heating Systems. 2025. Available online: https://www.danfoss.com/en/about-danfoss/articles/dhs/the-importance-of-system-boundaries-when-evaluating-the-energy-efficiency-of-district-heating-systems/ (accessed on 15 October 2025).
Mertz, T.; Serra, S.; Henon, A.; Reneaume, J.-M. A MINLP optimization of the configuration and the design of a district heating network: Academic study cases. Energy 2016, 117, 450–464. [Google Scholar] [CrossRef]
Weinand, J.M.; Kleinebrahm, M.; McKenna, R.; Mainzer, K.; Fichtner, W. Developing a combinatorial optimisation approach to design district heating networks based on deep geothermal energy. Appl. Energy 2019, 251, 113367. [Google Scholar] [CrossRef]
Sameti, M.; Haghighat, F. Optimization of 4th generation distributed district heating system: Design and planning of combined heat and power. Renew. Energy 2019, 130, 371–387. [Google Scholar] [CrossRef]
Lee, D.; Yoon, S.M.; Lee, J.; Kim, K.; Song, S.H. Applying deep learning to the heat production planning problem in a district heating system. Energies 2020, 13, 6641. [Google Scholar] [CrossRef]
Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M. Optimal deep learning LSTM model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies 2018, 11, 1636. [Google Scholar] [CrossRef]
Ryu, S.; Noh, J.; Kim, H. Deep neural network based demand side short term load forecasting. Energies 2017, 10, 3. [Google Scholar] [CrossRef]
Lee, D.; Kim, K. Recurrent neural network-based hourly prediction of photovoltaic power output using meteorological information. Energies 2019, 12, 215. [Google Scholar] [CrossRef]
Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
Vesterlund, M.; Toffolo, A.; Dahl, J. Optimization of multi-source complex district heating network, a case study. Energy 2017, 126, 53–63. [Google Scholar] [CrossRef]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 1–5 May 2023; Available online: https://openreview.net/forum?id=ju_Uqw384Oq (accessed on 15 October 2025).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) 2017, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: New York, NY, USA, 2017. Available online: https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf (accessed on 15 October 2025).
Sameti, M.; Haghighat, F. Optimization approaches in district heating and cooling thermal network. Energy Build. 2017, 140, 121–130. [Google Scholar] [CrossRef]
Jie, P.; Zhu, N.; Li, D. Operation optimization of existing district heating systems. Appl. Therm. Eng. 2015, 80, 20–28. [Google Scholar] [CrossRef]
Talebi, B.; Mirzaei, P.A.; Bastani, A.; Haghighat, F. A review of district heating systems: Modeling and optimization. Front. Built Environ. 2016, 2, 22. [Google Scholar] [CrossRef]
Lesko, M.; Bujalski, W.; Futyma, K. Operational optimization in district heating systems with the use of thermal energy storage. Energy 2018, 165, 902–915. [Google Scholar] [CrossRef]
Qin, C.; Yan, Q.; He, G. Integrated energy systems planning with electricity, heat and gas using particle swarm optimization. Energy 2019, 188, 116044. [Google Scholar] [CrossRef]
Wang, H.; Wang, H.; Haijian, Z.; Zhu, T. Optimization modeling for smart operation of multi-source district heating with distributed variable-speed pumps. Energy 2017, 138, 1247–1262. [Google Scholar] [CrossRef]
Wu, C.; Gu, W.; Jiang, P.; Li, Z.; Cai, H.; Li, B. Combined economic dispatch considering the time-delay of district heating network and multi-regional indoor temperature control. IEEE Trans. Sustain. Energy 2017, 8, 1709–1719. [Google Scholar] [CrossRef]
Riedmüller, S.; Rivetta, F.; Zittel, J. Long-Term Multi-Objective Optimization for Integrated Unit Commitment and District Heating Scheduling. arXiv 2024, arXiv:2410.06673. [Google Scholar]
Sporleder, M.; Rath, M.; Ragwitz, M. Design optimization of district heating systems: A review. Front. Energy Res. 2022, 10, 971912. [Google Scholar] [CrossRef]
Kopanos, G.M.; Murele, O.C.; Silvente, J.; Zhakiyev, N.; Akhmetbekov, Y.; Tutkushev, D. Efficient planning of energy production and maintenance of large-scale combined heat and power plants. Energy Convers. Manag. 2018, 171, 1304–1317. [Google Scholar] [CrossRef]
Sandberg, A.; Wallin, F.; Li, H.; Azaza, M. An analyze of long-term hourly district heat demand forecasting of a commercial building using neural networks. Energy Procedia 2017, 105, 3784–3790. [Google Scholar] [CrossRef]
Marino, D.L.; Amarasinghe, K.; Manic, M. Building energy load forecasting using deep neural networks. In Proceedings of the IECON 2016—42nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy, 23–26 October 2016; pp. 7046–7051. [Google Scholar] [CrossRef]
Rahman, A.; Smith, A.D. Predicting heating demand and sizing a stratified thermal storage tank using deep learning algorithms. Appl. Energy 2018, 228, 108–121. [Google Scholar] [CrossRef]
Lu, K.; Meng, X.R.; Sun, W.X.; Zhang, R.G.; Han, Y.K.; Gao, S.; Su, D. GRU-based encoder–decoder for short-term CHP heat load forecast. IOP Conf. Ser. Mater. Sci. Eng. 2018, 392, 062173. [Google Scholar] [CrossRef]
Lu, K.; Bi, Z.; Wang, X.; Meng, X.; Li, H.; Sun, W.; Zhu, Z.; Liu, Z. Short-term CHP heat load forecast method based on concatenated LSTMs. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 99–103. [Google Scholar] [CrossRef]
Yu, J.; Shen, X.; Sun, H. Economic dispatch for regional integrated energy system with district heating network under stochastic demand. IEEE Access 2019, 7, 46659–46667. [Google Scholar] [CrossRef]
Jiang, M.; Rindt, C.; Smeulders, D.M.J. Optimal planning of future district heating systems—A review. Energies 2022, 15, 7160. [Google Scholar] [CrossRef]
Talebi, B.; Haghighat, F.; Tuohy, P.; Mirzaei, P.A. Optimization of a hybrid community district heating system integrated with thermal energy storage system. J. Energy Storage 2019, 23, 128–137. [Google Scholar] [CrossRef]
Huang, Y.; Zhao, Y.; Wang, Z.; Liu, X.; Liu, H.; Fu, Y. Explainable district heat load forecasting with active deep learning. Appl. Energy 2023, 350, 121753. [Google Scholar] [CrossRef]
Van Dreven, J.; Boeva, V.; Abghari, S.; Grahn, H.; Al Koussa, J.; Motoasca, E. Intelligent approaches to fault detection and diagnosis in district heating systems. Electronics 2023, 12, 1448. [Google Scholar] [CrossRef]
Weber, S.A.; Fischlschweiger, M.; Volta, D.; Geisler, J. Feature selection for specific prediction targets at the user level in district heating networks. Sci. Rep. 2025, 15, 29789. [Google Scholar] [CrossRef] [PubMed]
Boutarene, A. Predicting Heat Load in District Heating Systems. Master’s Thesis, Stockholm University, Stockholm, Sweden, 2025. Available online: http://su.diva-portal.org/smash/get/diva2:1962718/FULLTEXT01.pdf (accessed on 15 October 2025).
Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-line building energy optimization using deep reinforcement learning. arXiv 2017, arXiv:1707.05878. [Google Scholar] [CrossRef]
Wei, T.; Wang, Y.; Hong, T. Deep reinforcement learning for building HVAC control. In Proceedings of the 4th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys), Delft, The Netherlands, 8–9 November 2017. [Google Scholar] [CrossRef]
Al Sayed, K.; Boodi, A.; Sadeghian Broujeny, R.; Beddiar, K. Reinforcement learning for HVAC control in intelligent buildings: A technical and conceptual review. J. Build. Eng. 2024, 95, 110085. [Google Scholar] [CrossRef]
Afram, A.; Janabi-Sharifi, F. Theory and applications of HVAC control using model predictive control—A review. Build. Environ. 2014, 72, 343–355. [Google Scholar] [CrossRef]
Verheyen, J.; Thommessen, C.; Roes, J.; Hoster, H. Effects on the unit commitment of a district heating system due to seasonal aquifer thermal energy storage and solar thermal integration. Energies 2025, 18, 645. [Google Scholar] [CrossRef]
Cui, M. District heating load prediction algorithm based on bidirectional long short-term memory network model. Energy 2022, 254, 124283. [Google Scholar] [CrossRef]
Naguib, M.; Kollmeyer, P.; Emadi, A. Lithium-ion battery pack robust state of charge estimation, cell inconsistency, and balancing: Review. IEEE Access 2021, 9, 50570–50582. [Google Scholar] [CrossRef]

Figure 1. DHS heat production planning and control framework.

Figure 2. MATN training and planning phases.

Figure 3. MATN model architecture.

Figure 4. DHS demand sample data.

Figure 5. Hyperparameter sensitivity for cost-awareness.

Figure 6. Training/validation loss and accuracy over 50 epochs.

Figure 7. Comparison of hourly heat production and cost profiles.

Table 1. Notations used in the proposed problem.

Notation	Description	Unit
t	Unit period for the planning
T	End of the planning periods
i	Production level
K	Maximum number of production levels
$x_{t}$	Heat production at time t	MWh
$u_{i, t}$	If production level i is selected at t, set to 1. Otherwise, 0.
$y_{t}$	If the facility is operated at time t, it is set to 1. Otherwise, 0.
$z_{t}$	If the facility starts up at time t, it is set to 1. Otherwise, 0.
$w_{t}$	If the facility shuts down at time t, it is set to 1. Otherwise, 0.
$F_{i}$	Heat production volume at the production level i	MWh
$q_{t}$	Heat inventory at the heat storage at time t	MWh
$s_{t}$	Heat supply from external network at time t	MWh
$r_{t}$	Heat sales to external network at time t	MWh
$D_{t}$	Heat demand at time t	MWh
$H_{t}$	Heat production cost at time t	USD/MWh
${E F}_{t}$	Heat supply cost from external network at time t	USD/MWh
${E T}_{t}$	Heat sales price to external network at time t	USD/MWh
C	Heat inventory holding cost	USD/MWh
$P_{m i n}, P_{m a x}$	Min and Max capacity at heat production facility	MW
${P C}_{m i n}, P C_{m a x}$	Min and Max capacity at heat storage	MW
$O T$	Min operation time of the production facility	Hour
$α_{i}$	Class weight for level $i$ in the weighted cross-entropy
$p_{i, t}$	Softmax probability that the model selects level $i$ at time $t$

Table 2. Model inputs and output features for MATN.

Categories	Descriptions	Number of Features
Inputs	${\hat{q}}_{t - 1}$ , initial inventory at the beginning of time t	1
	${\hat{x}}_{t - 1}$ , heat production at time t − 1	1
	$D_{t - 23}, \dots, D_{t},$ heat demand data for the past 24 h	24
	$H_{t - 23}, \dots, H_{t}$ , hourly production costs for the past 24 h	24
Output	${\hat{u}}_{i, t}$ , target heat production level indicator at time t	i

Table 3. Structural parameter ablation with test accuracy and total cost.

Index	Number of Layer	Number of Head	Number of Top-k	Accuracy	Cost
A1	2	2	1	77.50%	196,578.35
A2	2	2	3	79.26%	200,380.87
A3	2	2	5	78.23%	194,987.77
A4	2	4	1	80.13%	193,867.88
A5	2	4	3	79.17%	190,568.41
A6	2	4	5	79.67%	191,530.94
A7	2	8	1	78.11%	188,707.79
A8	2	8	3	78.01%	195,272.26
A9	2	8	5	77.98%	190,729.23
A10	3	2	1	79.34%	194,792.20
A11	3	2	3	79.13%	191,766.22
A12	3	2	5	78.44%	197,323.18
A13	3	4	1	78.40%	195,178.02
A14	3	4	3	76.22%	197,208.24
A15	3	4	5	78.06%	195,949.76
A16	3	8	1	78.45%	197,886.27
A17	3	8	3	78.83%	189,428.18
A18	3	8	5	78.47%	199,672.11
A19	4	2	1	79.26%	207,002.87
A20	4	2	3	77.72%	197,400.13
A21	4	2	5	76.14%	198,520.65
A22	4	4	1	79.92%	194,236.02
A23	4	4	3	77.22%	198,277.50
A24	4	4	5	78.75%	193,950.74
A25	4	8	1	77.89%	198,959.16
A26	4	8	3	78.99%	195,893.00
A27	4	8	5	77.98%	190,729.23

Table 4. Total cost and accuracy comparison.

Dataset	Total Operation Cost (USD)			Accuracy
Dataset	MIP	DNN	MATN	DNN	MATN
1	198,658.39	220,812.98	209,543.83	76.79%	80.00%
2	184,608.80	203,229.94	189,407.38	77.00%	80.58%
3	165,845.17	179,442.22	167,510.09	76.89%	82.25%
4	196,220.41	216,606.60	206,143.42	77.26%	78.42%
5	199,945.69	215,901.00	203,376.74	76.51%	79.25%
6	170,035.34	184,604.26	172,773.43	76.60%	81.00%
7	160,868.08	173,123.14	163,521.72	76.95%	83.33%
8	190,647.88	211,028.40	194,658.41	76.99%	82.08%
9	195,720.11	211,283.42	203,182.63	77.63%	78.33%
10	163,526.79	176,477.69	164,288.45	68.50%	82.08%
Avg	182,607.67	199,250.96	187,440.61	77.72%	80.73%

Table 5. Short horizon autoregressive accuracy.

Data Set	MATN			DNN
Data Set	H = 4	H = 12	H = 24	H = 4	H = 12	H = 24
1	87.50%	73.33%	82.92%	80.00%	74.16%	72.91%
2	65.00%	84.17%	82.92%	60.00%	69.16%	72.91%
3	87.50%	78.33%	82.92%	80.00%	70.00%	80.83%
4	82.50%	77.50%	83.33%	67.50%	86.66%	73.33%
5	77.50%	76.67%	81.25%	75.00%	66.67%	73.33%
6	70.00%	90.00%	82.08%	32.50%	75.83%	81.66%
7	82.50%	84.17%	83.75%	70.00%	67.50%	79.58%
8	90.00%	92.50%	84.58%	65.00%	80.83%	73.75%
9	80.00%	80.00%	77.50%	42.50%	57.5%	71.25%
10	82.50%	84.17%	82.50%	57.50%	77.5%	78.75%
Avg	80.50%	82.08%	82.38%	63.00%	72.58%	75.83%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.; Lee, S.; Park, I.-B.; Kim, K. A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands. Energies 2025, 18, 5963. https://doi.org/10.3390/en18225963

AMA Style

Kim J, Lee S, Park I-B, Kim K. A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands. Energies. 2025; 18(22):5963. https://doi.org/10.3390/en18225963

Chicago/Turabian Style

Kim, Jahun, Sangjun Lee, In-Beom Park, and Kwanho Kim. 2025. "A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands" Energies 18, no. 22: 5963. https://doi.org/10.3390/en18225963

APA Style

Kim, J., Lee, S., Park, I.-B., & Kim, K. (2025). A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands. Energies, 18(22), 5963. https://doi.org/10.3390/en18225963

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Multi-Head Attention-Based TimesNet for Heat Production Planning Under Unknown Future Demands

Abstract

1. Introduction

2. Literature Review

3. Problem Description

4. Proposed Method

4.1. Overall Framework

4.2. Data Configuration and Inference Method

4.3. Proposed MATN-Based Heat Production Planning Model

4.4. Cost-Aware Loss Function

5. Computational Experiments

5.1. Datasets

5.2. Results and Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI