You are currently viewing a new version of our website. To view the old version click .
Energies
  • Article
  • Open Access

19 November 2025

Hybrid Demand Forecasting in Fuel Supply Chains: ARIMA with Non-Homogeneous Markov Chains and Feature-Conditioned Evaluation

and
Faculty of Civil Engineering, Cracow University of Technology, 31-155 Krakow, Poland
*
Author to whom correspondence should be addressed.
This article belongs to the Section A: Sustainable Energy

Abstract

In the context of growing data availability and increasing complexity of demand patterns in retail fuel distribution, selecting effective forecasting models for large collections of time series is becoming a key operational challenge. This study investigates the effectiveness of a hybrid forecasting approach combining ARIMA models with dynamically updated Markov Chains. Unlike many existing studies that focus on isolated or small-scale experiments, this research evaluates the hybrid model across a full set of approximately 150 time series collected from multiple petrol stations, without pre-clustering or manual selection. A comprehensive set of statistical and structural features is extracted from each time series to analyze their relation to forecast performance. The results show that the hybrid ARIMA–Markov approach significantly outperforms both individual statistical models and commonly applied machine learning methods in many cases, particularly for non-stationary or regime-shifting series. In 100% of cases, the hybrid model reduced the error compared to both baseline models—the median RMSE improvement over ARIMA was 13.03%, and 15.64% over the Markov model, with statistical significance confirmed by the Wilcoxon signed-rank test. The analysis also highlights specific time series features—such as entropy, regime shift frequency, and autocorrelation structure—as strong indicators of whether hybrid modeling yields performance gains. Feature-conditioning analyses (e.g., lag-1 autocorrelation, volatility, entropy) explain when hybridization helps, enabling a feature-aware workflow that selectively deploys model components and narrows parameter searches. The greatest benefits of applying the hybrid model were observed for time series characterized by high variability, moderate entropy of differences, and a well-defined temporal dependency structure—the correlation values between these features and the improvement in hybrid performance relative to ARIMA and Markov models reached 0.55–0.58, ensuring adequate statistical significance. Such approaches are particularly valuable in enterprise environments dealing with thousands of time series, where automated model configuration becomes essential. The findings position interpretable, adaptive hybrids as a practical default for short-horizon demand forecasting in fuel supply chains and, more broadly, in energy-use applications characterized by heterogeneous profiles and evolving regimes.

1. Introduction

Accurate short-term demand forecasting is a fundamental requirement for inventory control and distribution planning in modern fuel supply chains. In the case of diesel fuel, which is commonly used across industrial, commercial, and transport sectors, forecasting plays a pivotal role in ensuring timely replenishment, improving dispatch and routing decisions, optimizing delivery schedules, reducing logistical costs, and improving energy efficiency. A well-tuned forecasting system enables fuel suppliers to prevent stockouts and overstocking, minimize waste, and reduce CO2 emissions related to suboptimal transport planning. These aspects are especially important in the context of Vendor-Managed Inventory (VMI), where suppliers are responsible for planning deliveries to multiple fuel stations. From a systems perspective, these operational improvements connect directly to energy-use efficiency and sustainability targets in the transport segment.
However, forecasting fuel demand at the station level is particularly challenging due to the stochastic nature of consumption patterns, local events, external shocks (e.g., price changes, weather), and irregular customer behavior. A single supplier may be required to generate forecasts for hundreds of stations, each characterized by a distinct time series with varying statistical properties. These may include differences in trend components, seasonal cycles, variance structures, noise levels, and autocorrelation dynamics. As a result, no single forecasting method is universally effective across all series. This creates a pressing need for automated and adaptive forecasting systems that can dynamically adjust to the characteristics of each time series. Such systems align with current developments in Artificial Intelligence in energy systems design and control, where scalable, data-driven methods are expected to enhance reliability and operational resilience. Moreover, the feature–performance relationships identified in this study can be operationalized as a lightweight pre-screening stage, enabling selective deployment of forecasting components and narrower parameter searches. This has the practical effect of reducing computational load in large-scale settings, which is increasingly relevant given the growing energy and cost footprint of routine model retraining.
Traditional statistical methods, such as ARIMA (AutoRegressive Integrated Moving Average) or exponential smoothing, offer interpretable and well-studied frameworks for time series modelling. However, they may struggle to capture regime shifts, nonlinearity, or irregularity in fuel demand. On the other hand, machine learning approaches, including neural networks or gradient boosting models, often require large amounts of data and substantial tuning effort and may lack robustness across diverse time series profiles. Moreover, such methods frequently operate as “black boxes,” making them less transparent for practical decision-making.
To overcome the limitations of individual models, hybrid forecasting approaches have gained increasing attention in recent years. These methods aim to combine the strengths of different models to achieve higher accuracy and robustness. In this study, we propose a dynamic hybrid model that combines an ARIMA component with a stochastic model based on discrete-time Markov chains, designed specifically for fuel demand forecasting. The hybridization is implemented through a weighted combination of forecasts, where the weighting parameter (denoted as α) is optimized in a moving time window by minimizing forecasting error. This approach allows the model to adjust dynamically to changing time series characteristics and data regimes, leveraging the strengths of each component at the right time.
Importantly, the hybrid model does not assume a fixed weighting scheme. Instead, the α parameter is updated iteratively, depending on the recent forecasting performance of the ARIMA and Markov models. This structure raises a natural research question In other words, do certain characteristics of diesel consumption profiles favour ARIMA-based modelling, while others suggest stronger reliance on Markov dynamics? Therefore, the objectives of this study are twofold:
  • To evaluate the forecasting accuracy of the hybrid ARIMA–Markov model compared to its standalone components across a large set of real-world diesel demand time series.
  • To investigate how statistical features of the time series (e.g., seasonality strength, volatility, entropy, autocorrelation) affect the effectiveness of hybridization and under what conditions the hybrid model significantly outperforms its individual counterparts, thereby informing feature-guided model selection in energy-system applications.
The contribution of this work is both methodological and practical. From a methodological perspective, we present a forecasting framework that automatically adapts to diverse time series without manual tuning. From a practical standpoint, the findings can support model selection automation for fuel supply chain operators by indicating, in advance, which series are likely to benefit from hybridization. Moreover, the results have implications for understanding the relationship between time series characteristics and forecasting model performance in high-stakes applications such as fuel logistics. The approach also connects to sustainability analysis in practice by enabling reductions in emergency deliveries and improved truck loading, which are proximate levers for CO2 mitigation in distribution operations.
The remainder of the article is structured as follows. In Section 2, we review relevant literature on hybrid forecasting models and their application in logistics and energy sectors. Section 3 describes the individual models (ARIMA and Markov) chains and outlines proposed methodology with the hybridization mechanism. Section 4 presents the dataset, feature engineering process, and experimental methodology the empirical results, including forecasting error comparisons and correlation analysis between time series features and the hybrid model’s behavior. Finally, last section concludes the study, discusses practical implications, and suggests directions for future research.

3. Methodological Framework

3.1. Basics of Forecasting with Markov Chains

3.1.1. Homogeneous Markov Chains

Markov chains represent a relatively straightforward yet effective tool for modelling processes in which the future state depends solely on the present one, without regard to the sequence of events that led to it. In the context of a petrol station, this means that demand at a given day or hour is primarily determined by demand during the preceding day or hour, rather than by sales patterns observed weeks or months earlier. Cyclical fluctuations in demand can be incorporated by designing an appropriate state structure within the Markov chain, thereby capturing, for instance, peaks in sales during weekends, holidays, or other specific periods. Unlike more advanced forecasting approaches, such as neural networks, Markov chains are relatively easy to interpret and communicate, which facilitates their acceptance and trust among station staff. The essential element of modelling with Markov chains lies in the definition of states and the transition probabilities between them. If latent patterns in the data can be expressed in terms of states and their transitions, then Markov chains can efficiently detect and represent such structures.
A Markov process is defined as a sequence of random variables in which the probability of the next event depends exclusively on the current state. For the purposes of the present analysis, only Markov processes defined on a discrete state space—that is, Markov chains—will be considered. Let us denote by:
X = X 0 , X 1 , , X t , , X t m a x
a sequence of discrete random variables. The value of the variable X t will be called the state of the chain s k at the moment t. The finite set of all possible states is called the state space S , which can be expressed as:
s S ,    S = s 1 , s 2 , , s k 1 , s k , k <
It is assumed that the set of states S is calculable. The discrete timestamps used in the considered problem can be defined as follows:
t ϵ T , T = 1,2 , , t m a x , t m a x
Definition 1. 
A sequence of random variables X is a Markov chain if the Markov property holds:
P X t + 1 = s t + 1     X t = s t , X t 1 = s t 1 , , X 0 = s 0 = P X t + 1 = s     X t = s t         t ϵ T   s 0 ,   , s t , s t + 1 S    
Thus, in a Markov chain, the conditional distribution at time t + 1 depends only on the state at time t , and not on the full trajectory of past states.
Definition 2. 
Let P  denote a transition matrix of dimension  ( k × k )  with elements { p i j : i , j = 1 , , k } . A sequence of random variables  ( X 0 , X 1 , ) , taking values in a finite state space  S = { s 1 , s 2 , , s k 1 , s k } , is a Markov chain with transition matrix  P  if, for every t  and for any  i , j { 1 , , k } , the following condition is satisfied:
p i j = P X t + 1 = s j     X t = s i
Here, p i j represents the conditional probability of transitioning from state s i at time t to state s j at time t + 1 . The elements of the transition matrix satisfy:
t ϵ T : p i j = P X t + 1 = s j X t = s i
p i j 0 ,    i , j { 1 , k }
i : j { 1 , k } p i j = 1
Definition 3. 
The Markov chain is homogeneous when, for each time stamp, it is described by the same transition matrix script P . The transition matrix is fixed and does not depend on time.
The specification of the initial state is also crucial in the construction of a Markov chain. Formally, the initial state is the random variable X 0 . Consequently, the process is typically initiated with a probability distribution over the state space.
Definition 4. 
The initial distribution π 0 ( s i ) is defined as the probability vector:
π 0 ( s i ) = π 0 s 1 , π 0 s 2 ,   , π 0 s k = P { X 0 = s i : s i S }  
The distribution of the forecasted state one step ahead is then given by:
π t   +   1 = π t   · P
Note that the π t   +   1 or π t are the row vectors; thus, the following recursive formulation applies:
π 1 = π 0 P ,    π 2 = π 1 P = π 0 P 2 ,    π 3 = π 2 P = π 0 P 3 , ,    π t + 1 = π 0 P t + 1
In practice, most applications of Markov chains assume fixed transition probabilities over time, i.e., homogeneous Markov chains (HMCs). However, this assumption may not hold in systems influenced by human behaviour, such as fuel demand, where patterns vary substantially across time horizons (for example, across seasons or phases of business cycles []). For this reason, in the subsequent analysis we adopt heterogeneous or non-homogeneous Markov chains, in which the transition matrix varies with time. Although such models lack a classical stationary distribution, they are nevertheless highly effective in forecasting contexts characterized by dynamic and time-dependent changes. Numerous methods and applications of heterogeneous Markov chains have been reported in the scientific literature, highlighting their practical usefulness [,].

3.1.2. Non-Homogeneous Markov Chains

In real-world systems, data describing stochastic processes such as demand are inherently discrete in nature yet derived from continuous observations, meaning that abrupt fluctuations may occur. This evolving behavior often departs from the assumptions underlying homogeneous Markov chains. When random variation plays a relatively minor role, it is common practice to derive the empirical distribution of observations, as expressed by Equation (11). By subsequently observing the process at a later time t + τ and recording the number of transitions between states, a transition matrix P ( τ ) can be constructed that satisfies the following relationship:
π t + τ = π t P ( τ )
The matrix:
P ( τ ) = { p τ , i , j } ( i , j S )
is referred to as the transition matrix at time τ , where τ [ 1 , , n ] and n denotes the length of the forecast horizon.
In the case of non-homogeneous Markov chains, the Markov property remains valid, although the transition probabilities may vary over time []. Such heterogeneous formulations are particularly suitable for short-term forecasting, where the transition matrices for successive steps are known or can be estimated in advance. This makes it possible to trace the evolution of the probability distribution of states while accounting for temporal changes in P ( τ ) at each iteration. Short-term forecasting does not require a stationary distribution, since it focuses on predicting state transitions within a limited future horizon. Further theoretical treatments of non-homogeneous Markov chains can be found in [,].
In the present analysis the transition matrix was re-estimated for each new data window, using a rolling analysis approach applied individually to every time series. Each time a new demand observation from the verification period was added to the learning set T , a new transition matrix was computed and used to generate the next state forecast. The resulting forecasted state was subsequently transformed into a point forecast of demand, as described below.
For the implementation of Markov chains in forecasting, the definition of the state space for each time series was a crucial step. The number of states was determined analogously to the procedure used for histogram construction. Each state represents a specific range of demand values, with the width of an interval X w calculated as:
X w = m a x ( X ) m i n ( X ) N
where m a x ( X ) and m i n ( X ) denote the maximum and minimum values in the series X , and N represents the number of observations. This formula implies that the width of each interval is proportional to the data range divided by the square root of the sample size, leading to narrower intervals for longer series. Although this approach is less conventional, it is particularly useful in modelling fuel demand, where high variability may otherwise produce states with negligible or zero probability of occurrence.
Each demand interval is thus assigned to a unique state from the state space S = { s 1 , s 2 , , s k } , as illustrated conceptually in Figure 1.
Figure 1. The concept of allocating demand intervals to the state space.
From Equation (10), the forecasted state probability distribution π t + 1 can be derived. The corresponding point forecast of demand for the next period was then computed as follows:
Y t + 1 P = j = 1 k π t + 1 ( s j ) X ^ j
where:
  • π t + 1 ( s j ) denotes the forecasted probability of the time series being in state s j ,
  • X ^ j represents the midpoint of the demand interval assigned to that state.
This formulation assumes that the expected future demand corresponds to the weighted mean of state midpoints, where the weights are given by the forecasted probabilities of each state. Using this procedure, forecasts for all time series were obtained iteratively for each period within the verification horizon.

3.2. Fundamentals of Forecasting with ARIMA Models

ARIMA models are grounded in the concept of autocorrelation, that is, the correlation between a variable and its own past values. The fundamental characteristic of ARIMA models lies in the assumption that the value of a variable at time t can be expressed as a linear combination of its previous values at moments t 1 ,   t 2 , , t p , augmented by a random error component. Within this class of models, three fundamental types can be distinguished:
  • Autoregressive models (AR);
  • Moving Average models (MA);
  • Combined Autoregressive–Moving Average models (ARMA).
In general form, an autoregressive model of order p may be represented as:
Y t = φ 0 + φ 1 Y t 1 + φ 2 Y t 2 + + φ p Y t p + e t
where
  • Y t ,   Y t 1 ,   Y t 2 ,   Y t p —the value of the variable, respectively, at the time t ,   t   1 ,   t 2 ,   t p ;
  • φ 0 ,   φ 1 ,   φ 2 ,   φ p —model parameters;
  • e t —value of random component at the period t ;
  • p —order lag.
The lag parameter p determines the number of previous time steps considered when estimating the variable’s current value. When the random errors from preceding periods are correlated, the process is modelled using a moving average (MA) structure, defined as:
Y t = ϑ 0 ϑ 1 e t 1 ϑ 2 e t 2 ϑ q e t q + e t
where:
  • e t , e t 1 , e t 2 , , e t q represent the model residuals at times t , t 1 , t 2 , , t q ;
  • ϑ 0 , ϑ 1 , ϑ 2 , , ϑ q are the model parameters;
  • and q denotes the order of the moving average.
To achieve a better fit to historical data, the autoregressive and moving average components are often combined, forming the ARMA model, which incorporates both p and q parameters. The general form of an ARMA( p , q ) model is given by:
Y t = ϕ 0 + ϕ 1 Y t 1 + ϕ 2 Y t 2 + + ϕ p Y t p + e t ϑ 0 ϑ 1 e t 1 ϑ 2 e t 2 ϑ q e t q
This formulation enables the model to simultaneously capture both the persistence of the time series (through autoregressive terms) and the influence of random shocks (through moving average terms).

4. Research Design

4.1. Methodology Framework

The methodological framework proposed in this study aims to investigate and evaluate a hybrid forecasting approach that integrates deterministic and stochastic modelling principles. The design of the method reflects the assumption that no single model can adequately capture all aspects of complex temporal behavior, particularly in the case of fuel demand, where both regular seasonal patterns and irregular fluctuations coexist. To address this challenge, the proposed framework combines the predictive strength of an ARIMA model, which captures linear temporal dependencies, with a Markov-based component that represents probabilistic state transitions and stochastic dynamics. Importantly, the hybrid model does not rely on a fixed weighting scheme. Instead, the contribution of each component is governed by an adaptive coefficient α , which is updated iteratively according to the recent forecasting performance of both models. This adaptive mechanism introduces an additional layer of learning, allowing the hybrid structure to respond dynamically to changing time series characteristics.
In order to achieve aims of the research, the methodological pipeline was structured into four principal phases, as illustrated in the accompanying schematic diagram (see Figure 2):
Figure 2. Methodology framework.
  • Input data and feature extraction phase, which involves loading, preprocessing, and quantifying descriptive characteristics of the analyzed time series;
  • Forecasting phase, in which the hybrid ARIMA–Markov model is trained and applied within a rolling prediction framework;
  • Hybrid model verification phase, where model performance is assessed using multiple accuracy metrics and comparative analysis; and
  • Final outcome, which synthesizes the empirical findings and interprets the behavior of the adaptive weighting parameter α in the context of time series features.
This multi-stage structure provides a coherent methodological foundation for exploring both the predictive efficiency and the interpretability of the proposed hybrid approach.
In the first stage of the study, a representative set of time series was selected, representing the daily consumption of diesel fuel at petrol stations. The data were obtained from a telemetry system monitoring the fuel levels in storage tanks. Before proceeding with further analyses, all time series were subjected to a cleaning and validation process—outliers were removed, any missing values were filled in, and the data were transformed into a uniform format suitable for further modeling.
Next, a set of statistical features describing the structural properties of each series was extracted. The analyzed features included measures of variability (such as standard deviation and coefficient of variation), autocorrelation (the value of the autocorrelation function at lag 1), entropy as a measure of the randomness of the time series, as well as kurtosis and skewness, which characterize the distribution of values. In addition, selected indicators of complexity and non-linearity were considered.
The feature extraction was intended not only to gain a better understanding of the data characteristics but also to enable further analysis of how these features influence forecast accuracy. The list of features considered in the study is presented in Table 1.
Table 1. List of time series features.
In the second phase, three categories of forecasting models were developed: the ARIMA model, the Markov model, and the hybrid ARIMA–Markov model. In parallel, Markov chain models were built by transforming the time series into sequences of discrete demand states and estimating transition matrices within each moving window, thereby forming heterogeneous Markov chains capable of reflecting local regime changes. Finally, the hybrid model linearly combined the forecasts from both components according to the adaptive weighting formula described in the following section, where the detailed implementation of the forecasting phase is presented.
In the third stage, the forecasting accuracy of each of the three models was evaluated. For each time series and each model, ex-post errors values were calculated (such as MAE, SMAPE, RMSE). Based on these metrics, a correlation analysis was then performed to examine the relationships between the previously extracted time series features and the obtained forecasting errors. The analysis was conducted independently for the ARIMA model, the Markov chain model, and the hybrid model. The objective was to identify which data features contribute to high forecasting accuracy for a given model, as well as to determine when and for which types of time series the hybrid model outperforms the individual approaches. Particular attention was paid to identifying those time series profiles for which the application of the hybrid approach yields the greatest forecasting benefits.
In the final phase of the study, conclusions were drawn and practical recommendations were formulated regarding the selection of forecasting methods based on the characteristics of time series. Based on the conducted analyses, guidelines were proposed to identify those types of series for which the hybrid model offers a significant improvement over individual methods. The study also highlighted the potential for implementing these findings in decision support systems in fuel logistics, particularly in adaptive forecasting modules that automatically select the appropriate forecasting method depending on the current characteristics of the data.

4.2. Hybrid Forecasting Framework Based on ARIMA–Markov Chains Linear Combination

Time series of fuel demand, including diesel fuel, are characterized by high levels of variability, seasonality, and the presence of local anomalies—such as sudden surges or drops resulting from promotional campaigns, weather changes, or the specific location of the fuel station. In practice, no single forecasting method proves universally effective. Forecasting accuracy in complex, dynamic systems can often be improved by combining models that capture different aspects of temporal behavior. The proposed forecasting framework integrates a classical ARIMA model with a Markov chain-based stochastic model, using a dynamic linear combination mechanism to balance their relative influence over time. The main purpose of this hybrid approach is to exploit both the deterministic structure captured by ARIMA and the stochastic transitions modelled by the Markov process, while allowing the relationship between the two to adapt dynamically as new data become available.
The framework follows a rolling forecasting scheme, in which both models are re-estimated on a moving training window. This approach enables the forecasting system to remain responsive to evolving data patterns, such as changes in demand levels or structural breaks in the time series. The flowchart for the hybrid ARIMA–Markov linear-combination pipeline is presented at Figure 3.
Figure 3. Flowchart for the hybrid ARIMA–Markov linear-combination pipeline.
The input data consist of multiple time series, each representing a distinct object or observation category, such as fuel demand at an individual station. Once the data are prepared, the model parameters are specified, including the size of the rolling training window, the forecasting horizon (typically one step ahead), and the number of discrete states used to represent the Markov process.
The forecasting procedure is performed iteratively across all time series, allowing model parameters to be updated continuously as new observations become available. At each iteration, an ARIMA model is fitted to the most recent data segment, producing a one-step-ahead forecast y ^ t + 1 A R I M A . The corresponding residuals: e t A R I M A = y t y ^ t A R I M A is retained as a measure of model performance and an indicator of potential non-linear behavior not captured by the autoregressive structure.
In parallel, a Markov chain model is constructed using recent observations. The values within the training window are discretized into a finite number of bins, each representing a distinct system state. Transitions between these states form a transition probability matrix, where each element p i j denotes the probability of moving from state s i to state s j . Given this matrix, the expected next-period value is obtained as the probability-weighted average of the state midpoints (see Equation (19)), yielding the one-step-ahead Markov forecast y ^ t M A R K O V . This component captures stochastic and local change patterns that complement the deterministic behaviour represented by the ARIMA model.
A main element of the framework is the dynamic linear combination of the ARIMA and Markov forecasts. Instead of using a fixed weighting scheme, an optimal coefficient α t is determined at each time step by solving a constrained optimisation problem that minimises the mean squared forecasting error:
m i n α t ( y t [ α t · y ^ t A R I M A + ( 1 α t ) · y ^ t M A R K O V ] ) 2
subject to 0 α t 1 . The formulation ensures a convex combination of both forecasts. This procedure allows the model to adaptively adjust the relative influence of each component, assigning greater weight to the one that performs better within the current rolling window.
The final hybrid forecast is expressed as:
Y ^ t + 1 H Y B R I D = α t · Y ^ t + 1 A R I M A + 1 α t · Y ^ t + 1 M A R K O V
This formulation may be interpreted as a form of online learning, where the combination rule evolves continuously in response to recent forecasting performance.
Once all rolling windows have been processed, the forecasting performance is assessed using standard accuracy metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Symmetric Mean Absolute Percentage Error (SMAPE).
The SMAPE (Symmetric Mean Absolute Percentage Error) metric was used in the conducted study as the primary forecast error indicator in the correlation analysis, replacing the traditional MAPE (Mean Absolute Percentage Error). The choice of SMAPE was motivated by several important considerations, which are outlined below. SMAPE divides the absolute forecast error not by the actual value (as in the case of MAPE), but by the arithmetic mean of the actual and predicted values. As a result, this metric treats positive and negative errors in a more symmetric manner, regardless of the direction of the forecast deviation. Unlike MAPE, which deteriorates when the actual value Y_t→0 (since dividing by values close to zero leads to infinite or extremely large errors), SMAPE mitigates this problem because the denominator is not based solely on the actual value. Additionally, SMAPE is expressed as a percentage, which enables intuitive interpretation and facilitates the comparison of errors across time series with different scales. In the context of the analyzed fuel demand time series, significant fluctuations in daily consumption were observed in some cases, the values were very low (e.g., weekend drops in demand), which made MAPE unstable and sensitive to extreme values. MAPE can lead to disproportionately high error values when denominators are close to zero. In contrast, SMAPE remains stable even in the presence of small or zero values, does not favor models that consistently underpredict, and provides a more reliable basis for comparing the accuracy of different models.
For each time series and model variant (ARIMA, Markov, and hybrid), the results are summarized and compared. The analysis includes both quantitative evaluation of accuracy and qualitative examination of the time-varying coefficient α t . These adaptive weights provide insight into the relative importance of deterministic and stochastic components over time and across different demand profiles, offering a deeper understanding of how the hybrid model adapts to varying data characteristics.

5. Results for the Case Study

5.1. Dataset Characteristics

The data used in the study comes from 147 different gas stations and represents the historical daily demand for diesel fuel in the period from 1 January 2023 to 31 December 2023. The observed time series values are in liters. The analyzed time series are characterized by high volatility due to the large number of factors affecting the demand for diesel fuel at gas stations (day of the week, seasonality, weather, location, etc.). In order to analyze the characteristics of the time series, a defined set of features was determined for each of them (see Table 1). On this basis, a preliminary assessment and similarity of the series can be made in order to create demand profiles, which will enable the development of recommendations for selecting the most effective forecasting model.
To provide an aggregated visualization of the structure of the analyzed time series in terms of their statistical properties, a Principal Component Analysis (PCA) was conducted. This technique allows for dimensionality reduction in multivariate data while retaining as much of the total variance of the original variables as possible. Each time series was represented in a 15-dimensional feature space describing its structure (e.g., measures of variability, autocorrelation, entropy, skewness, kurtosis, etc.). As part of the PCA procedure, all features were first standardized (mean equal to 0, standard deviation equal to 1) to eliminate the influence of differences in scale among the variables. Then, a covariance matrix was computed, from which the eigenvalues and corresponding eigenvectors were extracted. These eigenvectors define the new axes of the principal components—PCA1, PCA2, and so on—which are linear combinations of the original features. The data were then projected onto these new axes, transforming them into the PCA coordinate space. A scatter plot in the space of the first two principal components (PCA1 and PCA2), presented in Figure 4, enables a visual assessment of the dispersion of the analyzed series in the feature space.
Figure 4. Data PCA analysis.
The first principal component (PCA1) explains 39% of the total variance, while the second component (PCA2) accounts for 25.5%, resulting in a cumulative explained variance of over 64% in two dimensions. This allows for an efficient and dimensionally reduced presentation of the structure of the studied population of time series. The analysis of the plot suggests that most of the time series are concentrated in the central region of the principal component space, indicating a relative homogeneity of the dataset in terms of structural characteristics. However, some series exhibit more extreme values along PCA1 or PCA2, potentially representing outliers or specific types of demand patterns (e.g., noticeably higher variability, strong seasonality, or low autocorrelation). The distributions of all analyzed characteristics in the entire time series set are presented in Figure 5.
Figure 5. Time series features distributions.
Based on Figure 5 one can conclude that the variability (measured, by variance, standard deviation, and variation coefficient) as well as local heteroskedasticity (local standard deviation variability) indicate significant differences between time series, manifested in the presence of long distribution tails. This suggests the existence of series with substantially higher dynamics than the average. It is also worth noting the distribution of the number of regime changes and variance of local means, which in most cases take low values but also highlight the presence of series with distinct jumps or trend shifts. The seasonality characteristic (maximum ACF calculated based on 30 lags) reveals a diverse strength of cyclical patterns, which may be relevant when selecting an appropriate forecasting model. Overall, the observed feature distributions confirm the existence of both typical and atypical time series within the dataset, which justifies the use of a hybrid approach capable of adapting to the local properties of the data.

5.2. Detailed Forecasting Results

This section illustrates the forecasting behavior of the proposed approach on selected examples before turning to aggregate evidence. Given the large number of time series analyzed and the rolling-window estimation scheme (which generates a distinct model at each verification step), an exhaustive display of all trajectories is impractical. Instead, representative cases are shown to convey typical dynamics and error patterns; the subsequent subsections summarizes performance across all series using aggregated statistics and distributional comparisons.
Figure 6 presents four representative series chosen to span contrasting regimes of seasonality (low vs. high) and volatility (low vs. high). For each panel, the last 60 verification points are shown with actuals (solid), hybrid (solid, emphasized), ARIMA (dashed), and Markov (red, dotted). Across regimes, the hybrid trajectory typically adheres more closely to the actual demand path than either baseline, particularly around local turning points and short-lived deviations, while ARIMA tends to smooth transitions and the Markov component captures discrete shifts with greater sensitivity. These examples are intended to illustrate the qualitative behavior of the models under different signal conditions rather than to provide definitive evidence.
Figure 6. Representative series by seasonality and volatility.
The remainder of the analysis reports aggregated results: distributional plots of error metrics, per-series improvements of the Hybrid model relative to its ARIMA and Markov baselines, rank-based comparisons, and conditioning analyses that relate hybrid gains to measurable time-series properties. This structure provides a balanced view of typical performance, heterogeneity across series, and the conditions under which hybridization offers the greatest benefit.
Aggregate performance was evaluated by summarizing errors across all series and contrasting the Hybrid specification with ARIMA and Markov baselines via distributional, per-series, and rank-based comparisons. Figure 7 presents the distribution of forecast errors for MAE, RMSE, MAPE, and SMAPE across models.
Figure 7. Distribution of ex-post forecast errors across models for verification data sets (top-left: MAE, top-right: RMSE, bottom-left: MAPE, bottom-right: SMAPE).
The hybrid model achieves consistently lower central errors and tighter interquartile ranges (IQR) than both baselines. For MAE, the median error is 693.97 with an IQR of 552.20–940.71, compared with ARIMA (874.99, IQR 678.84–1156.26) and Markov (897.10, IQR 692.71–1201.16). A similar pattern is observed for RMSE (Hybrid: 933.27, IQR 739.65–1260.46; ARIMA: 1089.64, 844.59–1451.11; Markov: 1104.13, 883.16–1525.86). The advantage also holds for percentage-based measures: MAPE (Hybrid median 19.85 vs. ARIMA 24.56 and Markov 24.90) and SMAPE (Hybrid 9.00 vs. ARIMA 10.96, Markov 11.38). These results indicate that the hybrid approach reduces both typical error magnitude and dispersion across heterogeneous demand profiles.
Hybrid improves upon ARIMA in 100% of cases (147/147 series), with a median error reduction of 18.67% (IQR 16.53–21.83%). Relative to Markov chain approach, hybrid model again improves 100% of series, with a median reduction of 20.54% (IQR 18.42–24.34%). An analogous analysis for RMSE yields consistent conclusions: hybrid model improves over ARIMA in 100% of series (median reduction 13.03%, IQR 11.21–16.13%) and over Markov in 100% of series (median reduction 15.64%, IQR 13.65–18.90%). Paired Wilcoxon signed-rank tests indicated statistically significant improvements of the Hybrid model over both ARIMA and Markov across all series (maximum p - v a l u e = 7.15 × 10 26 , N = 147 ).
As a further analysis, an empirical cumulative distribution function (ECDF) comparison was conducted to assess relative error levels across all series (see Figure 8). The ECDF, which plots the cumulative proportion of observations up to a given value, was applied to the ratios R M S E H y b r i d / R M S E A R I M A and R M S E H y b r i d / R M S E M a r k o v . Values below 1.0 indicate that the hybrid model attains a lower error than the respective baseline.
Figure 8. Empirical cumulative distribution function for relative error levels across all series (left: RMSE Hybrid/RMSE ARIMA, right: RMSE Hybrid/RMSE Markov Chain).
The ECDF for R M S E H y b r i d / R M S E A R I M A lies entirely to the left of 1.0, demonstrating improvement for all series. The curve rises steeply around its centre, indicating concentrated gains rather than effects driven by a small number of outliers. Quantitatively, the median RMSE reduction relative to ARIMA equals 13.03% (IQR 11.21–16.13%), corresponding to a median ratio of approximately 0.87. An analogous pattern is observed for the comparison with Markov: the ECDF of R M S E H y b r i d / R M S E M a r k o v also lies fully left of 1.0, with a median reduction of 15.64% (IQR 13.65–18.90%) and a median ratio near 0.84. These curves therefore indicate universal (all-series) and consistent (narrow IQRs) benefits of the hybrid approach over both baselines.

5.3. Correlation Analysis

Based on the accuracy results obtained from all models (ARIMA, Markov chains, Hybrid Model) expressed by the SMAPE error, Spearman’s correlation coefficients were calculated between SMAPE for each time series and the corresponding feature values. Additionally, in order to obtain information on which features may positively influence the benefits of using the hybrid model, a new variable was defined using the following formula:
I m p S M A P E A R I M A = S M A P E A R I M A S M A P E H Y B R I D
I m p S M A P E M A R K O V = S M A P E M A R K O V S M A P E H Y B R I D
These variables ( I m p . S M A P E A R I M A ,   I m p . S M A P E M a r k o v )   describe the relative improvement in the accuracy of forecasts obtained from the hybrid model compared to the errors obtained from ARIMA models and Markov chains used separately. This also made it possible to determine which features of the series have a positive impact on the benefits of hybridization. The matrix with correlation coefficient values for all pairs is shown in Figure 9.
Figure 9. Features correlation analysis.
In Figure 4, the Spearman correlation matrix is presented between analyzed time series statistical features and the forecasting errors (SMAPE) obtained from the ARIMA, the Markov chain, and the hybrid model. Additionally, the matrix includes the correlation between the features and the relative improvement in the hybrid model’s performance compared to each of the base models. The analysis revealed that the strongest positive correlations with forecasting errors (across all three models) are observed for features describing both global and local variability: variance, standard deviation, coefficient of variation (CV), and variability of local standard deviation (heteroskedasticity). This indicates that greater instability in the time series translates into increased forecasting difficulty, regardless of the method used. On the other hand, features characterizing the distribution structure of the data, such as kurtosis and skewness, show negative correlations with forecasting errors, particularly in the case of the ARIMA model. This suggests that such models perform better with series that have more symmetric and less heavy-tailed distributions. Particular attention should be paid to features related to the temporal dependence structure, such as the mean ACF (1:7) and mean PACF (1:7), which measure the average strength of lagged dependencies in the time series. These features exhibit consistent negative correlations with the forecasting errors of all models (ranging from approximately −0.51 to −0.56), indicating that the presence of strong temporal dependencies improves forecasting accuracy. In other words, the models perform better on time series that exhibit regularity and cyclic behavior in demand patterns. The entropy of differences, interpreted as a measure of complexity and unpredictability, shows a moderate positive correlation with forecasting errors (up to 0.34 for the ARIMA model). High entropy may indicate the presence of chaotic or irregular changes in the series, which lead to increased forecast error. Conversely, a series with low entropy in its differences is easier to predict due to reduced random fluctuations. The analysis of correlations between the features and the performance improvement of the hybrid model (compared to ARIMA and Markov) suggests that the greatest benefits of hybridization occur for time series characterized by high variability, but also in cases where entropy is moderate and the structure of temporal dependencies is well defined. These correlations reach values around 0.55–0.58, indicating statistically meaningful relationships.
In order to determine which of the designated correlation coefficient values are statistically significant, a Student’s t-test was performed. Table 2 shows the corresponding p-values. An alpha value of 0.05 was adopted as the significance level.
Table 2. p-values from significance testing of Spearman correlations between input features and forecasting accuracy (α = 0.05).
For most of the analyzed relationships, the p-values are lower than the adopted significance level of α = 0.05, which confirms that the values of the calculated correlation coefficients for the selected features are statistically significant. Statistical significance has been particularly confirmed for features such as: Entropy of differences, Mean PACF(1:7), Mean ACF(1:7), as well as global and local variability measures—including variance, standard deviation, coefficient of variation (CV), and local standard deviation variability. All these features show significant associations (p < 0.001), confirming their relevance in the context of forecasting difficulty. In the case of the hybrid model, statistically significant correlations are observed for the majority of features, which may indicate its ability to adapt to a wide range of structural properties of time series data.

6. Conclusions

This study introduced a dynamic hybrid forecasting framework that linearly combines an ARIMA component with a discrete-time Markov chain, with the combining weight updated adaptively in a rolling window. Using daily diesel demand from 147 stations, the approach was benchmarked against its standalone constituents. Across all series and error metrics, the Hybrid specification delivered consistent and practically meaningful improvements: central errors were reduced and interquartile ranges tightened relative to both ARIMA and Markov baselines. Empirical cumulative distribution analyses showed error ratios below unity for every series, and paired non-parametric tests confirmed statistical significance. These results indicate that hybridization enhances both accuracy and reliability in short-horizon fuel demand forecasting.
From an energy-systems perspective, the gains are operationally relevant. More accurate short-term forecasts can stabilize Vendor-Managed Inventory operations, reduce safety stocks and emergency replenishments, improve vehicle routing and load factors, and thereby curb transport-related CO2 emissions and costs. Because the framework remains lightweight and interpretable, it lends itself to deployment in industrial settings where transparency and rapid retraining are essential.
The analysis also clarified when hybridization helps. Conditioning on time-series characteristics showed that stronger short-range dependence (e.g., higher lag-1 autocorrelation) and the coexistence of seasonality with moderate-to-high volatility are favorable regimes. In these settings, the adaptive weighting exploits linear predictability through ARIMA while the Markov component absorbs residual regime shifts. Correlation analyses were consistent with this mechanism: measures of temporal dependence were negatively associated with forecast errors across all models, whereas variability and entropy increased difficulty; the Hybrid’s advantage over each baseline generally grew with variability provided that dependency structure was not overwhelmed by noise.
An additional advantage is computational: by exploiting the observed associations between features and accuracy gains, many routine fits can be avoided or warm-started (e.g., constrained ARIMA orders, adaptive state binning, or reduced retraining frequency). This not only shortens wall-clock time but also lowers the energy demand of continuous forecasting operations—an aspect aligned with sustainable computing in energy logistics.
Methodologically, the framework balances interpretability and adaptivity without resorting to data-hungry “black boxes”. The study reports sufficient implementation detail (rolling re-estimation, feature extraction, and model selection procedures) to enable reproduction and transfer to related fuels or energy commodities. This makes the approach a pragmatic candidate for AI in energy systems design and control, particularly where many heterogeneous demand profiles must be forecast concurrently.
There are limitations that open avenues for future work. First, only univariate, one-step-ahead point forecasts were considered; extensions to multi-horizon, probabilistic, and intraday settings would broaden applicability. Second, the state discretization in the Markov layer was data-driven but fixed within windows; learned or adaptive partitions may capture non-linearities more effectively. Third, only endogenous information was used; integrating exogenous drivers (prices, weather, calendar effects) and explicit regime-detection could further stabilize performance during structural breaks. Finally, linking forecast distributions directly with downstream inventory and routing optimization would close the loop from prediction to decision, supporting system-level objectives such as cost and emission minimization.
In summary, the proposed ARIMA–Markov hybrid provides consistent accuracy gains across a heterogeneous set of real-world demand profiles while retaining operational simplicity and transparency. The evidence supports hybridization as a robust default for short-term forecasting in fuel logistics and, more broadly, in energy-use contexts characterized by meaningful temporal structure and non-trivial variability.

Author Contributions

Conceptualization, D.K. and P.W.; methodology, D.K. and P.W.; formal analysis, D.K. and P.W.; resources, P.W.; data curation, D.K.; writing—original draft preparation, D.K. and P.W.; writing—review and editing, P.W.; visualization, D.K. and P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available due to confidentiality agreements, they cannot be shared. Requests to access the datasets should be directed to corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ARIMAautoregressive integrated moving average
AIartificial intelligence
LSTMlong short-term memory
ANNartificial neural network
ACFautocorrelation function
PACFpartial autocorrelation function
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
SMAPESymmetric Mean Absolute Percentage Error
RMSERoot Mean Squared Error

References

  1. Kontopoulou, V.I.; Panagopoulos, A.D.; Kakkos, I.K.; Matsopoulos, G.K. A review of ARIMA vs. Machine Learning approaches for time series forecasting in data driven networks. Future Internet 2023, 15, 255. [Google Scholar] [CrossRef]
  2. Hamdoun, H.; Sagheer, A.; Youness, H. Energy time series forecasting—Analytical and empirical assessment of conventional and machine learning models. arXiv 2021, arXiv:2108.10663. [Google Scholar] [CrossRef]
  3. Siami-Namini, S.; Siami Namin, A. Forecasting Economics and Financial Time Series: ARIMA vs. LSTM. arXiv 2018, arXiv:1803.06386. [Google Scholar] [CrossRef]
  4. Zhao, Z.; Fu, C.; Wang, C.; Miller, C. Improvement to the Prediction of Fuel Cost Distributions Using ARIMA Model. arXiv 2018, arXiv:1801.01535. [Google Scholar] [CrossRef]
  5. Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
  6. Bates, J.M.; Granger, C.W.J. The Combination of Forecasts. J. Oper. Res. Soc. 1969, 20, 451–468. [Google Scholar] [CrossRef]
  7. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M4 Competition: Results, findings, conclusion and way forward. Int. J. Forecast. 2018, 34, 802–808. [Google Scholar] [CrossRef]
  8. Montero-Manso, P.; Hyndman, R.J. Principles and algorithms for forecasting groups of time series: Locality and globality. Int. J. Forecast. 2021, 37, 1632–1653. [Google Scholar] [CrossRef]
  9. Khashei, M.; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft Comput. 2011, 11, 2664–2675. [Google Scholar] [CrossRef]
  10. Hong, W.C. Electricity load forecasting by seasonal SVR with chaotic artificial bee colony algorithm. Energy 2011, 36, 5568–5578. [Google Scholar] [CrossRef]
  11. Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
  12. Atesongum, A.; Gulsen, M. A Hybrid Forecasting Structure Based on Arima and Artificial Neural Network Models. Appl. Sci. 2024, 14, 7122. [Google Scholar] [CrossRef]
  13. Dong, Z.; Zhou, Y. A Novel Hybrid Model for Financial Forecasting Based on CEEMDAN-SE and ARIMA-CNN-LSTM. Mathematics 2024, 12, 2434. [Google Scholar] [CrossRef]
  14. Tsoku, J.; Metsileng, D.; Botlhoko, T. A Hybrid of Box-Jenkins ARIMA Model and Neural Networks for Forecasting South African Crude Oil Prices. Int. J. Financ. Stud. 2024, 12, 118. [Google Scholar] [CrossRef]
  15. Wang, W.; Ma, B.; Guo, X.; Chen, Y.; Xu, Y. A hybrid ARIMA–LSTM model for short-term vehicle speed prediction. Energies 2024, 17, 3736. [Google Scholar] [CrossRef]
  16. Babai, M.; Boylan, J.; Rostami, B. Demand forecasting in supply chains: A review of aggregation and hierarchical approaches. Int. J. Prod. Res. 2021, 60, 1–25. [Google Scholar] [CrossRef]
  17. Sun, Y.; Xing, X.; Zhou, Y.; Hu, X. Demand Forecasting for Petrol Products in Gas Stations Using Clustering and Decision Tree. J. Adv. Comput. Intell. Intell. Inform. 2018, 22, 387–393. [Google Scholar] [CrossRef]
  18. Tsiliyannis, C. Markov chain modeling and forecasting of product returns in remanufacturing based on stock mean-age. Eur. J. Oper. Res. 2018, 271, 474–489. [Google Scholar] [CrossRef]
  19. Zhou, Y.; Wang, L.; Zhong, R.; Tan, Y. A Markov Chain based demand prediction model for stations in bike sharing system. Math. Probl. Eng. 2018, 8028714. [Google Scholar] [CrossRef]
  20. Wiliński, A. Time Series Modelling and Forecasting Based on a Markov Chain with Changing Transition Matrices. Expert Syst. Appl. 2019, 133, 163–172. [Google Scholar] [CrossRef]
  21. Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
  22. Bandara, K.; Bergmeir, C.; Smyl, S. Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert Syst. Appl. 2020, 140, 112896. [Google Scholar] [CrossRef]
  23. Kang, Y.; Hyndman, R.J.; Smith-Miles, K. Visualising forecasting algorithm performance using time series instance spaces. Int. J. Forecast. 2017, 33, 345–358. [Google Scholar] [CrossRef]
  24. Erjiang, E.; Yu, M.; Tian, X.; Tao, Y. Dynamic model selection based on demand pattern classification in retail sales forecasting. Mathematics 2022, 10, 3179. [Google Scholar] [CrossRef]
  25. Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 2627–2633. [Google Scholar] [CrossRef]
  26. Mystakidis, A.; Koukaras, P.; Tsalikidis, N.; Ioannidis, D.; Tjortjis, C. Energy Forecasting: A Comprehensive Review of Techniques and Technologies. Energies 2024, 17, 1662. [Google Scholar] [CrossRef]
  27. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLoS ONE 2018, 13, e0194889. [Google Scholar] [CrossRef]
  28. Pope, E.C.D.; Stephenson, D.B.; Jackson, D.R. An Adaptive Markov Chain Approach for Probabilistic Forecasting of Categorical Events. Mon. Weather Rev. 2020, 148, 3681–3691. [Google Scholar] [CrossRef]
  29. Chan, K.C. Market share modelling and forecasting using Markov Chains and alternative models. Int. J. Innov. Comput. Inf. Control 2015, 11, 1205–1218. [Google Scholar] [CrossRef]
  30. Gagliardi, F.; Alvisi, S.; Kapelan, Z.; Franchini, M. A Probabilistic Short-Term Water Demand Forecasting Model Based on the Markov Chain. Water 2017, 9, 507. [Google Scholar] [CrossRef]
  31. Brémaud, P. Non-homogeneous Markov Chains. In Markov Chains; Texts in Applied Mathematics; Springer: Berlin/Heidelberg, Germany, 2020; Volume 31, pp. 399–422. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.