Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador

Mendez-Santos, Pablo Alejandro; Chacón-Reino, Nathalia Alexandra; Guerrero-Vásquez, Luis Fernando; Ordoñez-Ordoñez, Jorge Osmani; Chasi-Pesantez, Paul Andrés

doi:10.3390/en18143659

Open AccessArticle

Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador

by

Pablo Alejandro Mendez-Santos

¹

,

Nathalia Alexandra Chacón-Reino

^2,†

,

Luis Fernando Guerrero-Vásquez

^2,*,†

,

Jorge Osmani Ordoñez-Ordoñez

²

and

Paul Andrés Chasi-Pesantez

²

¹

Distribution Department, Empresa Eléctrica Regional Centro Sur C.A., Ave Max Uhle y Av. Pumapungo, Cuenca EC010150, Ecuador

²

Telecommunications and Telematics Research Group, Universidad Politécnica Salesiana, Cuenca EC010103, Ecuador

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2025, 18(14), 3659; https://doi.org/10.3390/en18143659

Submission received: 23 May 2025 / Revised: 30 June 2025 / Accepted: 8 July 2025 / Published: 10 July 2025

(This article belongs to the Special Issue Editorial Board Members’ Collection Series: Economic Growth, Energy Consumption and Carbon Emission)

Download

Browse Figures

Versions Notes

Abstract

The accurate estimation of electricity supply costs has become increasingly relevant due to growing demand, variable generation sources, and regulatory changes in emerging power systems. This study models the average unit cost of electricity supply (USD/kWh) in Ecuador using multiple linear regression techniques and ARIMAX forecasting, based on monthly data from 2018 to 2024. The regression models incorporate variables such as energy demand, generation mix, transmission costs, and regulatory indices. To enhance model robustness, we apply three variable selection strategies: correlation analysis, PCA, and expert-driven selection. Results show that all models explain over 70% of price variability, with the highest-performing regression model achieving

R^{2} = 0.9887

. ARIMAX models were subsequently implemented using regression-based forecasts as exogenous inputs. The ARIMAX model based on highly correlated variables achieved a MAPE below 5%, showing high predictive accuracy. These findings support the use of hybrid statistical models for informed policy-making, tariff planning, and operational cost forecasting in structurally constrained energy markets.

Keywords:

multiple linear regression; power distribution costs; electricity price forecasting; emerging energy markets

1. Introduction

Electricity price forecasting has become a critical topic as power markets undergo deregulation and experience increasing volatility. Since the beginning of this century, various authors have emphasized its relevance for both generators and consumers in operational planning and financial risk management. In this context, Contreras et al. [1] highlighted that, in competitive markets, the ability to anticipate spot prices is essential for maximizing revenues and minimizing costs, while later reviews, such as that by Aggarwal et al. [2], reinforced this view within deregulated market environments. More recently, several studies have addressed the challenge of price forecasting in highly volatile electricity markets by incorporating advanced machine learning techniques and artificial neural networks [3,4]. Tschora et al. [5] investigated the application of various machine learning algorithms to forecast prices in the European day-ahead market, with particular emphasis on the predictive potential of features such as historical prices from neighboring countries—an approach that has been underexplored to date. In this context, electricity price forecasting has emerged as a strategic tool for both market participants and regulators seeking stability and efficiency in energy supply.

Dominated by regression-based approaches, the methodological landscape of electricity price forecasting centers around linear regression models. Both standard multiple linear regression and regularized variants—such as LASSO, ridge, and elastic net—are employed to enhance model generalization and address multicollinearity [6,7]. In addition, quantile regression techniques have gained traction for probabilistic forecasting by capturing price uncertainty through the estimation of different quantiles of the distribution [8]. These regression methods are often embedded in hybrid frameworks that combine linear models with more complex algorithms (e.g., time series decomposition, decision tree ensembles, or neural networks) to exploit both linear effects and nonlinear patterns in the data [9,10]. Nevertheless, despite these advances, most studies remain focused on developed markets, highlighting the lack of application of such techniques in emerging economies such as Ecuador.

Nevertheless, most studies have focused on markets in Europe, North America, and other developed economies, with limited research addressing emerging electricity markets. Dragašević et al. [11], for instance, analyzed a developing market by applying multivariable regression techniques to identify the fundamental factors influencing price formation. In the case of Ecuador, the electricity sector presents particular characteristics that distinguish it from extensively studied markets. The Ecuadorian generation matrix is highly dependent on hydropower: approximately 79% of electricity production came from hydroelectric sources in 2021 [12]. This reliance on a single source renders the system vulnerable to extreme climatic events. In fact, a severe drought triggered a national energy crisis in 2023, leading to extended electricity rationing through scheduled blackouts across the country [13]. This critical situation underscored the need to enhance energy price forecasting and planning tools in Ecuador, both to strengthen supply security and to inform energy policy decisions under uncertain conditions.

Motivated by this context, the present study aims to model the monthly average unit price of energy supply (USD/kWh) in Ecuador’s distribution system using multiple linear regression models. Unlike many studies focused on short-term and hourly spot prices, this analysis considers monthly behavior, which is relevant for national energy and tariff planning. Several key indicators from the electricity sector are used as explanatory variables: national energy demand, disaggregated generation by source type (hydropower, thermal, and other renewables), production costs (fixed and variable) and transmission costs, imported energy volume, and a settlement index reflecting economic adjustments in the wholesale market. These variables capture the main structural determinants of price in the Ecuadorian context, encompassing supply–demand balance, generation mix and operating costs, and international energy exchanges. To select the most relevant variables and avoid multicollinearity, three regression-based methodologies were implemented: (1) expert-driven variable selection, (2) direct selection based on correlation with price, (3) dimensionality reduction through principal component analysis (PCA) to construct latent factors, and (4) variable selection based on factor loadings in the first principal components. The model was calibrated and validated using monthly data from the 2018–2024 period, covering both years prior to and following the 2023 energy crisis, which allows for assessing the robustness of the approach under different market conditions. Additionally, an ARIMAX model was applied to perform price forecasting based on each of the proposed regression models, further enhancing the predictive capabilities of the analysis.

Despite the wide adoption of forecasting models such as PCA-enhanced regression and ARIMAX in developed markets, their application remains scarce in emerging energy systems with high structural dependencies and regulatory heterogeneity, such as Ecuador. In particular, no prior studies have systematically combined dimensionality reduction through PCA with multiple linear regression to explain electricity cost dynamics in Ecuador. Moreover, the use of ARIMAX models with exogenous regressors derived from cost-based linear models has not been previously implemented in the region. This study addresses this empirical gap by proposing an integrated modeling framework tailored to Ecuador’s unique generation profile, regulatory adjustments, and market volatility, thereby contributing to methodological innovation and regional planning strategies.

To conclude, the article structure is outlined as follows. Section 2 presents the related works and discusses the relevant literature. Section 3 describes the background and context of the Ecuadorian electricity sector. Section 4 details the methodology employed, including the variable selection techniques, regression models, and ARIMAX model. Section 5 presents the results obtained and their analysis. Section 6 discusses the implications of these findings and potential future research directions, and finally, Section 7 outlines the main conclusions of the study.

2. Related Works

Accurate determination of the average unit price of energy supply is essential to ensure efficient management within electricity distribution systems. Several studies have addressed this issue using statistical techniques and advanced predictive models, particularly multiple linear regression, due to its ability to incorporate multiple explanatory variables and provide clear interpretations of the relationships among energy, economic, and regulatory factors. This section presents a synthesis of recent research applying mathematical approaches to model and forecast energy supply costs, highlighting the methodologies used, variables considered, results obtained, and key limitations identified, as summarized in Table 1. Such comparative analysis provides context for the contributions of the present study and supports the relevance of the proposed methodological approach.

Weron [14] presented a comprehensive review of the state of the art in electricity price forecasting (EPF). The study systematically compiled and analyzed a broad range of methods used over the previous 15 years, covering classical linear statistical models as well as nonlinear approaches such as GARCH, neural networks, and others. The review outlines the complexity of available solutions, discusses their strengths and weaknesses, and identifies both opportunities and challenges associated with forecasting tools. Additionally, the author looks ahead and emphasizes the need for more objective comparative studies (using identical datasets, robust error metrics, and statistical significance testing) to guide the next decade of EPF research. As a synthesis article, it does not propose new models or perform empirical validations, serving more as a theoretical reference than a practical guide for implementation.

Agudelo et al. [15] addressed electricity price forecasting in the Colombian power exchange using neural network techniques. They employed a nonlinear autoregressive neural network with exogenous inputs (NARX), incorporating variables such as electricity demand, the hydrothermal generation ratio, El Niño event probability, and reservoir storage levels, in addition to historical prices as autoregressive inputs. The NARX model exhibited strong predictive performance: the simulated prices reproduced 96% of the variability of the actual time series, with residuals behaving as white noise, demonstrating robust generalization even beyond the training period. However, the model relies heavily on hydrological conditions and specific local data, limiting its direct applicability in other contexts without appropriate recalibration.

Ramos et al. [16] analyzed monthly electricity prices in the Iberian market (MIBEL) using a structural model with economic and climatic variables. A multiple linear regression was applied with exogenous variables such as per capita consumption, heating and cooling degree days, the hydro productivity index (HPI), and the industrial production index (IPI), among others, to explain variations in electricity prices. The model explained approximately 53% of price variability in Portugal and 29% in Spain; it also identified that economic indicators such as higher HPI and IPI are associated with lower electricity prices (negatively influencing price levels). A limitation of the study is the use of aggregated monthly series, which does not capture daily or hourly variations, and the presence of autocorrelation in the residuals, suggesting that the model could benefit from refinement to improve its predictive capacity.

Zheng et al. [17] proposed an approach for forecasting the Locational Marginal Price (LMP) by decomposing the problem into its price subcomponents. Specifically, the LMP was separated into energy, congestion, and loss components, with individual forecasting models developed for each component and subsequently combined into an ensemble forecast. This component-based strategy enabled more accurate and robust LMP predictions compared to traditional methods that treat the price as a monolithic variable. However, the effectiveness of the approach depends on correctly decomposing the price components; if the separation is inadequate or the input data are limited, the model’s advantages and performance may be compromised.

Ulgen and Poyrazoglu [18] examined key predictors for electricity price forecasting in the Turkish wholesale market. They applied a multiple linear regression model incorporating both lagged electricity prices (including moving averages) and exogenous fuel variables—natural gas, oil, and coal prices—within a dynamic estimation framework. Including historical prices and these energy inputs improved forecasting accuracy; these predictors proved significant for estimating electricity prices, substantially reducing prediction errors compared to simpler models. As a limitation, the study was based on data from Turkey with a short forecasting horizon (12 days), and it did not explore more complex nonlinear relationships in price dynamics, leaving room for future enhancement using more advanced techniques.

Zhang et al. [19] designed an optimized linear regression scheme for real-time electricity load forecasting to support market participants in their operational decisions. The approach focused on simplifying the model and enhancing its robustness using statistical criteria: stepwise variable selection based on the Akaike Information Criterion (AIC) and influence analysis (Cook’s distance) were employed to refine the model by addressing potential outliers and multicollinearity. Despite its simplicity, the resulting linear model achieved effective demand forecasts using only publicly available data, demonstrating the usefulness of a parsimonious and transparent approach for short-term applications. However, as the study is oriented toward load (demand) forecasting rather than price forecasting, its conclusions cannot be directly transferred to electricity price prediction models without additional considerations.

As evidenced by the preceding analysis, various advanced statistical methodologies and predictive models have been successfully applied in specific contexts, highlighting the importance of incorporating local, economic, and regulatory factors for effective electricity price forecasting. However, these approaches often exhibit limitations regarding the direct transferability of results across markets due to regional differences in institutional and corporate structures, energy resource availability, regulatory frameworks, and economic conditions. In this regard, the Ecuadorian electricity market—characterized by strong dependence on hydropower generation, climatic variability, and specific regulatory schemes—requires tailored studies that explicitly account for these local factors. Therefore, it becomes essential to develop customized mathematical models based on multiple linear regressions, calibrated with context-specific Ecuadorian variables, to accurately forecast the average unit price of energy supply in distribution systems. Such models would contribute to strengthening the technical and economic management of the national electricity system.

3. Energy Supply in the Ecuadorian Electricity Sector

The electricity supply industry is typically segmented according to its main activities: production (generation), transportation (transmission), distribution, and commercialization [21]. Depending on the corporate structure defined by national legislation, each activity is carried out by different institutions or companies dedicated to a specific function. The cost of supplying electricity to major load blocks connected to the distribution network is primarily determined by the sum of production and transmission costs.

Ecuador’s electricity generation matrix is diversified across primary sources, including renewable hydropower (with or without reservoirs), thermal generation from fossil fuels, and non-conventional renewable sources such as biomass, photovoltaic, and wind energy. However, hydropower remains predominant, covering 69.1% of national demand in 2023. Thermal generation accounted for 25.6% during the same period, while the remaining percentage was supplied by non-conventional renewable generation and electricity imports from the Colombian power system [22]. This composition of the generation matrix has a direct impact on both electricity production costs and the average supply price.

In Ecuador, the largest share of installed hydropower generation capacity is concentrated in the Amazon basin. Water resource availability in this region exhibits marked seasonal fluctuations, characterized by a significant decrease in precipitation—and consequently, river flow—between the months of September and November each year [23]. Since hydropower constitutes the main source of supply for national demand, its production costs and the average electricity prices are directly affected by this hydrological variability, tending to increase during periods of reduced water availability.

As shown in Figure 1, the variation in average energy purchase prices exhibits peaks and troughs that coincide with the dry and rainy seasons of the Ecuadorian Amazon basin—that is, higher prices from September through November or December each year. For the period from 2018 to 2022, price fluctuations remain relatively small; however, in 2023 and 2024, the most significant variations across the entire seven-year dataset are observed.

This behavior is partly explained by the Ecuadorian system’s strong dependence on hydropower generation. During periods of high water availability, the average price decreases, whereas in dry seasons the price increases—a pattern that, in 2023 and 2024, revealed significant vulnerability to hydrological variability [24]. Another contributing factor is the application, since 2016, of the generation and transmission cost settlement mechanism established by the Electricity Regulation and Control Agency (ARCONEL). This mechanism defines a cost redistribution index among distribution companies. In the analyzed case, the regulatory index has shown a sustained decline since 2017, with increases during the dry periods of 2023 and 2024 as mandated by ARCONEL.

These costs are mostly classified as fixed costs, since their occurrence is independent of the volume of electricity produced. Additionally, variable costs are considered, which are directly correlated with the amount of energy generated during a specific period. This category includes items such as fuel supply and transportation, lubricants, and chemical products, among others [25]. The recognition and management of these costs are formalized through regulated energy purchase agreements, signed between generation companies and distribution and commercialization entities.

Additionally, energy transmission service costs are considered. Although not directly related to production, transmission is an essential service that enables the effective delivery of power and energy blocks to distribution networks, and ultimately to the load. According to current regulations [25], these costs are calculated based on a tariff that remunerates both fixed and variable costs of the state-owned transmission company.

One cost that has had a significant impact in recent years on the average supply price at the distribution network level is related to energy imports from the Colombian power system. During the period from October to December 2023, a severe drought event occurred in the Amazon basin, triggering a major energy crisis in Ecuador due to the reduced generation capacity of hydroelectric plants, even leading to energy rationing [24]. As a result, the national power system became heavily reliant on imported energy to mitigate the adverse effects of reduced domestic generation availability. However, the cost of imports was very high, as the neighboring country was also affected by the drought and relied on high-cost thermal sources for energy exports.

The hydrological crises of 2023–2024, particularly the severe drought in the Amazon basin, are empirically captured in the dataset by three variables: a significant decline in monthly hydropower generation (Hydropower), a sharp increase in cross-border energy imports (TIES), and adjustments in regulatory compensation schemes (Settlement ratio). These variables serve as quantitative proxies for the structural and climatic disruptions observed during the period and are explicitly incorporated into the regression and ARIMAX models.

Additionally, Figure 2 clearly shows that the settlement ratio has a significant effect on the average unit energy supply price. Until 2022, both variables remained relatively stable, with only minor fluctuations. However, starting in 2023, there is an abrupt change in the settlement ratio that coincides with a sharp increase in the average price, suggesting that regulatory changes have an immediate and notable impact on energy supply costs in Ecuador. This finding highlights the importance of explicitly incorporating regulatory factors in energy price modeling.

In the current context of the Ecuadorian electricity market—characterized by the absence of competition among producers and a strong dependence on variables such as primary resource availability (particularly hydropower), demand growth, and limited supply expansion—energy supply is inherently tied to the conditions and availability of imports from the Colombian power system.

Due to space constraints, detailed regulatory and technical specifications of Ecuador’s electricity system—including tariff composition, transmission cost structures, and contract mechanisms—are provided in Appendix A. This allows the main text to focus on the variables and modeling components directly relevant to cost forecasting.

4. Methodology

This study aims to analyze the relationship between the average electricity price (dependent variable) and various energy and economic variables (independent variables). The process begins with a description of each variable used, followed by a normality analysis. Based on the results of the normality analysis, appropriate statistical tests are selected. Subsequently, an analysis of significant differences, correlation among variables, and a principal component analysis are conducted. Finally, using the statistical information obtained, mathematical models are developed to identify the levels of fit and significance.

To guide the empirical analysis, the study is structured around the following research hypotheses:

H1 (Expert-driven variable selection): A regression model constructed with variables selected based on domain knowledge (e.g., hydropower, thermal generation, energy imports) will provide interpretable insights into cost drivers, though with moderate predictive accuracy.
H2 (Correlation-based variable selection): A regression model based on variables with the highest statistical correlation with the average unit price will outperform other approaches in terms of predictive accuracy and statistical fit.
H3 (PCA-based variable reduction): A regression model using variables derived from principal component analysis will achieve acceptable forecasting accuracy while reducing multicollinearity and dimensionality, offering a parsimonious alternative.

Furthermore, each of these hypotheses is tested using ARIMAX modeling to evaluate the medium-term forecasting performance of the selected regression models. The comparative results aim to inform both methodological practices and regulatory planning in energy markets with structural constraints, such as Ecuador.

To assess the robustness and interpretability of the modeling framework, three different variable selection strategies were implemented and compared. The first approach, based on expert judgment, reflects practical knowledge of the Ecuadorian electricity sector and its structural dependencies. The second approach, driven by statistical correlation, identifies the strongest direct relationships with the dependent variable and aims to maximize predictive power. The third approach applies PCA to explore the underlying structure of the data, reduce dimensionality, and address multicollinearity. This comparative setup allows the evaluation of trade-offs between model accuracy, explanatory interpretability, and complexity, thereby providing a robust basis for selecting the most suitable modeling strategy for policy and planning applications.

4.1. Analysis Variables

The variables considered are defined and described below:

Average price (USD cents/kWh): This variable represents the unit average value per kWh purchased to supply the load in a given month. It is calculated as the ratio between the total cost of energy purchased during the month and the total amount of energy purchased in the same period.
Demand (kWh/month): This is the total amount of energy consumed per month by the load. It represents the overall energy requirement that must be met by various generation sources and imports.
Hydropower energy (kWh/month): This is the amount of hydroelectric energy used to supply the load. It reflects the contribution of hydropower to the energy mix.
TIES (kWh/month): International Electricity Transactions; this is the amount of imported energy from the Colombian system used to supply the load. It indicates the reliance on and utilization of cross-border electricity.
Thermal energy (kWh/month): This is the amount of thermoelectric energy used to supply the load. It reflects the share of thermal generation in meeting the demand.
Other-source energy (kWh/month): This is the amount of energy from other technologies (wind, photovoltaic, biogas, etc.) used to supply the load. It represents the contribution of non-conventional renewable sources and other emerging technologies.
Fixed cost CR (USD/month): Cost without index application; this is the total monthly cost paid to generators with regulated energy purchase contracts for investment, amortization, and other components not dependent on the energy produced. It includes items such as investment recovery, amortizations, and other fixed charges associated with the availability of contracted generation.
Variable cost CR (USD/month): Cost without index application; this is the total monthly cost paid to generators with regulated energy purchase contracts for components that depend on the amount of energy produced. It mainly includes fuel costs and other variable inputs associated with generation.
GNC cost (USD/month): Costs of non-conventional generation; this is the total monthly cost paid to generators with or without energy purchase contracts whose production is based on renewable or non-conventional energy sources (wind, photovoltaic, biogas, etc.). It includes the variable costs associated with this type of generation.
TIES cost (USD/month): This is the total monthly cost paid for energy imports from the Colombian power system. It reflects the expense associated with acquiring electricity from the neighboring market.
TFT cost (USD/month): This is the total monthly cost paid for the energy transmission service. It covers the charges associated with the use of transmission networks.
Settlement ratio: Regulatory factor for reallocating energy purchase costs. According to the methodology established in the regulation, it is applied as a multiplier to generation (excluding TIES) and transmission costs. Its function is to reassign energy purchase costs among different market agents in accordance with current regulations.

4.2. Normality Analysis of the Variables

To ensure the validity of the statistical methods applied in this study, a normality analysis was performed on each of the variables considered in the model. This analysis assesses whether the variable distributions conform to the normality assumption, which is a fundamental requirement for the application of parametric techniques such as multiple linear regression. The evaluation included both formal statistical tests and graphical representations, aimed at identifying potential skewness, kurtosis, or outlier behavior that could affect the robustness of the proposed model.

Figure 3a shows the distribution of the average price, which is neither symmetric nor unimodal due to the concentration of most values between 2 and 3 USD cents/kWh. However, some months exhibit relatively high prices (7–8 USD cents/kWh), revealing a cluster of elevated values. In the Q–Q plot, a noticeable deviation from the red reference line is observed, particularly in the upper and lower tails. This clear deviation confirms that the average price does not follow a normal distribution (Shapiro–Wilk:

p \approx 7.8 \cdot 10^{- 14}

), implying that the normality assumption must be rejected.

Figure 3b shows that the distribution of demand is not symmetric, due to the concentration of a single mode between 0.95 and

1.05 \cdot 10^{8}

kWh/month. However, the presence of substantial bars toward the right may suggest the influence of some high values. In the Q–Q plot, a deviation from the red reference line is observed in both the upper and lower tails. This deviation indicates that Demand does not follow a normal distribution (Shapiro–Wilk:

p \approx 0.00191

), leading to the rejection of the normality assumption.

Figure 3c shows that the distribution of Hydroelectric Power is not symmetric, due to the presence of a main peak between 0.85 and

0.95 \cdot 10^{8}

kWh/month. In the Q–Q plot, a deviation similar to that of Demand is observed. This deviation indicates that Hydroelectric Power does not follow a normal distribution (Shapiro–Wilk:

p \approx 0.0001067

), leading to the rejection of the normality assumption.

The following section presents the main characteristics of the variable distributions and the conclusions derived from the normality tests. For a more detailed analysis of the distributions and Q–Q plots of the variables studied, refer to Appendix A.

The variables TIES and Cost TIES exhibit non-symmetric distributions, with left-skewness and a single mode near zero. The Q–Q plots show significant deviations in both tails, confirming that TIES and Cost TIES do not follow a normal distribution (Shapiro–Wilk:

p \approx 3.058 \cdot 10^{- 13}

and

p \approx 2.985 \cdot 10^{- 3}

, respectively).

In contrast, the variable GNC cost displays a symmetric distribution with slight right skewness. The Q–Q plot indicates a reasonable alignment with normality, especially in the central region. Normality is accepted for this variable (Shapiro–Wilk:

p \approx 0.5593

).

The variables Thermal Energy and Variable cost CR exhibit non-symmetric distributions, with a primary mode around 0.5 and secondary peaks. Their Q–Q plots reveal significant deviations, confirming that neither variable follows a normal distribution (Shapiro–Wilk:

p \approx 4.36 \cdot 10^{- 7}

and

p \approx 4.501 \cdot 10^{- 8}

, respectively). In the case of Fixed cost, although the central portion of the distribution aligns with normality, the tails show clear deviations, leading to a rejection of the normality assumption (Shapiro–Wilk:

p \approx 2.199 \cdot 10^{- 11}

).

The variables Energy from other sources and Settlement ratio exhibit bimodal and multimodal distributions, respectively. Their Q–Q plots show irregular deviations, indicating that they do not follow a normal distribution (Shapiro–Wilk:

p \approx 3.489 \cdot 10^{- 6}

and

p \approx 2.834 \cdot 10^{- 10}

). Finally, the variable Cost TFT is moderately symmetric, with slight right skewness. Although the central region aligns with the normal distribution, the tails exhibit deviations, leading to rejection of normality (Shapiro–Wilk:

p \approx 0.01271

).

In general, with few exceptions, almost all analyzed variables do not strictly follow a normal distribution. Shapiro–Wilk test results show very low p-values (

p < 0.01

) for most variables, indicating their distributions differ significantly from normality. The only variable with

p > 0.05

was GNC cost, whose distribution, as observed, is approximately normal. These deviations may be due to the presence of skewness (for example, Demand and Hydroelectric Power exhibit right tails), high kurtosis, or even bimodal/multimodal distributions caused by structural changes during the period (as with average price, which experienced an abrupt change in mid-2023).

Given the lack of normality in most variables, it is wise to be cautious when running analyses that assume normality. Applying transformations may be considered if it is necessary to get closer to a normal distribution. However, for our main goal of variable selection and price modeling, this non-normality is not necessarily damaging. It simply suggests that robust methods could be a better choice. At the very least, the model assumptions need to be verified down the line.

4.3. Analysis of Significant Differences

Whether the variables exhibit significant changes over time or other groupings is examined by using mean comparison tests. Since normality analyses indicate that most variables do not follow a normal distribution, it is more appropriate to replace parametric tests (Student’s t and ANOVA) with non-parametric alternatives (Mann–Whitney U, Kruskal–Wallis H, Wilcoxon signed-rank, Friedman).

When comparing more than two independent groups (years 2018 to 2024), the Kruskal–Wallis H test is used to determine whether significant differences exist between the medians of two or more independent groups. The obtained value of

H = 39.8675

with

p = 4.837 \cdot 10^{- 7}

indicates highly significant differences among the analyzed years. This implies that the groups are not homogeneous in terms of the dependent variable average price.

Figure 4 shows that the average energy prices (in USD/kWh) exhibit low variability and consistent values between 2018 and 2022, with medians close to 2 and 3 (USD cents/kWh) depending on the year. In contrast, the years 2023 and 2024 show notable increases in both medians and variability, indicating a significant change in price patterns. Additionally, outliers are identified in 2020, 2021, 2022, 2023, and 2024, highlighting unusual fluctuations during these periods.

These results, supported by the Kruskal–Wallis H statistical analysis, suggest significant differences between the years, with a marked increase in prices in the most recent years.

To complement the Kruskal–Wallis H results and identify specific pairwise differences in average energy prices across years, Dunn’s post hoc test was applied with Bonferroni correction. The resulting heatmap (Figure 5) presents adjusted p-values for all year-to-year comparisons. Darker cells correspond to non-significant differences (

p > 0.05

), while lighter tones highlight statistically significant contrasts.

Statistically meaningful differences emerge between 2024 and most previous years, with particularly low p-values observed in comparisons with 2019 (

p = 0.006

), 2022 (

p = 0.016

), and 2021 (

p = 0.000

). These results reflect a sharp deviation in average unit prices during 2024, likely influenced by persistent effects of the 2023 hydrological crisis and ongoing regulatory adjustments. Additionally, significant differences between 2023 and 2020 (

p = 0.359

), as well as between 2023 and 2021 (

p = 0.359

), indicate transitional price behavior leading up to the crisis period.

Conversely, periods such as 2019–2022 exhibit no significant pairwise differences (

p \approx 1.000

), suggesting a relatively stable pricing regime prior to major climatic and market disruptions. These findings reinforce the temporal segmentation observed in the boxplot analysis, providing further statistical support for claims of structural shifts in energy price dynamics beginning in late 2022.

4.4. Correlation Analysis

Linear relationships among the different variables are analyzed using Spearman’s rank correlation coefficient. This coefficient measures the linear association between two quantitative variables. Values close to

\pm 1

indicate a strong linear correlation (positive or negative), while a value near 0 indicates little or no linear relationship.

Figure 6 shows the correlation matrix calculated among all numerical variables in the dataset, including the average price. The column and row corresponding to the average price (USD cents/kWh) are notable, as it is the dependent variable. It exhibits a strong positive correlation with several cost variables, such as Cost TIES (

r \approx 0.85

), Settlement ratio (

r \approx 0.83

), Fixed cost CR (

r \approx 0.78

), Variable cost CR (

r \approx 0.77

), and generated Thermal Energy (

r \approx 0.77

). This suggests that the unit energy price increases as associated costs rise, especially with higher generation of Thermal Energy. Conversely, the average price shows a strong negative correlation with Hydroelectric Power (

r \approx - 0.79

), indicating that in years of higher hydropower generation, the average price tends to be lower.

Significant relationships are also observed among the independent variables. Hydroelectric Power is negatively correlated with Thermal Energy (

r \approx - 0.82

) and with Variable cost CR (

r \approx - 0.79

). This reflects that in months with higher hydropower contribution, thermal generation and the variable fuel-related costs decrease.

Thermal Energy shows a very strong correlation with Variable cost CR (

r \approx 0.99

), reflecting a high likelihood that variable costs depend directly on the amount of thermal energy produced (such as fuel expenses). This extremely high correlation indicates near-perfect multicollinearity between these two variables. Additionally, a notable strong correlation exists between Cost TIES and both Variable cost CR (

r \approx 0.72

) and Thermal Energy (

r \approx 0.69

), suggesting that increases in internal thermal generation may coincide with rises in import or exchange costs (TIES).

4.5. Principal Component Analysis

To reduce data dimensionality and capture the maximum possible variability, a principal component analysis was conducted. This method allows the identification of patterns in the data and prioritizes those components that explain a significant proportion of the total variability.

The loadings of each variable among the eleven independent variables indicate the relative contribution of each variable to the formation of the principal components. Each principal component (PC) was derived as a linear combination of the original variables.

To determine the optimal number of principal components, the cumulative variance explained by each component was analyzed. As shown in Figure 7, the first three principal components explain approximately 95% of the total data variability, exceeding the 90% threshold (considered a standard criterion in this analysis). Consequently, the first three principal components were selected for subsequent analyses.

Analyzing the extracted principal components reveals significant patterns in the data variability. Principal Component 1 (PC1) is strongly influenced by Thermal Energy (kWh/month) (

r = 0.41

), Variable cost CR (USD/month) (

r = 0.41

), and Cost TIES (USD/month) (

r = 0.37

). This suggests that PC1 is primarily related to thermal energy generation and associated variable costs, highlighting the relevance of these variables in the total variability.

In contrast, Principal Component 2 (PC2) shows a marked association with Demand (kWh/month) (

r = - 0.62

), Cost TFT (USD/month) (

r = - 0.48

), and GNC cost (USD/month) (

r = - 0.38

). This indicates that PC2 captures a dimension of variability linked to specific energy demands and costs associated with certain generation technologies, representing a distinct axis of variation from PC1.

Finally, Principal Component 3 (PC3) is primarily defined by Energy from other sources (kWh) (

r = - 0.65

), GNC cost (USD/month) (

r = - 0.53

), and the Settlement ratio (

r = - 0.32

). PC3 reflects variations related to less predominant energy sources and their impact on overall costs.

Together, these results demonstrate that the first principal components successfully extract fundamental aspects of the analyzed energy system, encompassing costs associated with thermal generation, demand variations, and specific costs of certain energy sources. Based on this identification, relevant variables from these principal components were selected, as detailed in Table 2.

5. Results

The results are presented in four specific sections: (1) linear regression models using variables selected according to the authors’ criteria, (2) linear regression models employing variables with the highest correlation, (3) linear regression models using variables defining the principal components, and (4) autoregressive models for price forecasting.

In analyzing the results, it is necessary to consider that the variables (both dependent and independent) do not fully comply with the assumptions of normality and linearity. Additionally, significant differences are observed among the studied years, with a notable increase in prices during the most recent periods, thereby increasing the complexity of the resulting models.

5.1. Regression Model Using Variables Selected According to Professional Criteria

Figure 8 presents the results of the multiple linear regression model for an initial selection of variables. It shows the relationship between the average kilowatt-hour price and Hydroelectric Power (kWh/month) (HP), TIES (kWh/month), Thermal Energy (kWh/month) (TE), and Energy from other sources (kWh/month) (EOS). Blue points represent observed values, while the red dashed line is included as an ideal fit reference. The considerable dispersion of the blue points around this reference line suggests a significant deviation from the ideal linear behavior expected in the relationship between the independent variables and price.

The fitted linear regression equation

C = 7.09 - 7 \times 10^{- 8} \cdot H P + 4 \times 10^{- 8} \cdot T I E S + 7 \times 10^{- 8} \cdot T E + 28 \times 10^{- 8} \cdot E O S

indicates the coefficients for each predictor variable. The coefficient of determination (

R^{2}

) is 0.7035, suggesting that approximately 70.35% of the variability in the average kilowatt-hour price can be explained by the variables included in this model. The extremely low p-value (

4.003 \times 10^{- 20}

) indicates that, overall, the model is statistically significant for predicting the average kilowatt-hour price.

Figure 9 presents the results of the multiple linear regression model for the second variable selection, where the average kilowatt-hour price is modeled as a function of Fixed cost CR (USD/month) (FcCR), Variable cost CR (USD/month) (VcCR), GNC cost (USD/month) (CGNC), Cost TIES (USD/month) (CTIES), and Cost TFT (USD/month) (CTFT).

The fitted regression equation

C = - 0.07 + 236 \times 10^{- 8} \cdot F c C R + 100 \times 10^{- 8} \cdot V c C R - 30 \times 10^{- 8} \cdot C G N C + 115 \times 10^{- 8} \cdot C T I E S - 373 \times 10^{- 8} \cdot C T F T

, together with a high coefficient of determination (

R^{2} = 0.8961

), suggests that approximately 89.61% of the variability in the average kilowatt-hour price is explained by the variables included in this model. The extremely low p-value (

7.172 \times 10^{- 37}

) confirms the overall statistical significance of the model for predicting the average kilowatt-hour price based on these costs.

Figure 10 shows the multiple linear regression model for the third variable selection, where the average kilowatt-hour price is explained by Hydroelectric Power (kWh/month) (HP), TIES (kWh/month) (TIES), Thermal Energy (kWh/month) (TE), Cost TIES (USD/month) (CTIES), and the Settlement ratio (SR). The relationship between the actual average price and the average price predicted by the model is shown through its equation

C = 2.29 - 3 \times 10^{- 8} \cdot H P - 9 \times 10^{- 8} \cdot T I E S + 2 \times 10^{- 8} \cdot T E + 4 \times 10^{- 8} \cdot E O S + 120 \times 10^{- 8} \cdot C T I E S + 4.34 \cdot S R

. The proximity of these points to the red dashed line, representing an ideal model, indicates the predictive capability of the model.

The high coefficient of determination (

R^{2} = 0.9654

) suggests that approximately 96.54% of the variability in the average kilowatt-hour price is explained by the model variables. The extremely low p-value (

4.246 \times 10^{- 54}

) confirms the overall statistical significance of the model for predicting the average kilowatt-hour price based on these variables.

5.2. Regression Models Using Variables with Highest Correlation

The multiple linear regression model constructed from variables with significant correlations identified in Figure 6 reveals a strong association between the average kilowatt-hour price and the considered predictor variables: Hydroelectric Power (HP), TIES (TIES), Thermal Energy (TE), Fixed cost CR (FcCR), Variable cost CR (VcCR), Cost TIES (CTIES), and the Settlement ratio (SR). The relationship estimated by the model is expressed by the equation

C = 0.81 - 4 \times 10^{- 8} \cdot H P - 5 \times 10^{- 8} \cdot T I E S - 6 \times 10^{- 8} \cdot T E + 153 \times 10^{- 8} \cdot F c C R + 112 \times 10^{- 8} \cdot V c C R + 98 \times 10^{- 8} \cdot C T I E S + 3.76 \cdot S R

.

Figure 11 shows a high concentration of blue points around the ideal red reference line, visually suggesting excellent predictive capability of the model. This is statistically confirmed by a coefficient of determination (

R^{2}

) of 0.9887, indicating that approximately 98.87% of the variability in the average kilowatt-hour price is explained by the linear combination of independent variables included in the model. The model’s p-value is extremely low (

2.78 \times 10^{- 71}

), evidencing robust statistical significance of the model as a whole for predicting the average kilowatt-hour price.

5.3. Regression Models Using Principal Components

In the application of PCA to reduce data dimensionality and prioritize variables with the greatest variability, significant patterns were identified in the first three principal components. Based on the variable loadings in these components, six representative variables were selected for constructing a multiple linear regression model: Demand (kWh/month) (DCSUR), Thermal Energy (kWh/month) (TE), Energy from other sources (kWh) (EOS), Variable cost CR (USD/month) (VcCR), GNC cost (USD/month) (CGNC), and Cost TFT (USD/month) (CTFT). Figure 12 of the linear regression model reveals a substantial relationship between the average kilowatt-hour price and the linear combination of these six variables. The fitted model equation is

C = 6.19 + 1 \times 10^{- 8} \cdot T E + 330 \times 10^{- 8} \cdot V c C R - 9 \times 10^{- 8} \cdot D C S U R - 175 \times 10^{- 8} \cdot C T F T + 40 \times 10^{- 8} \cdot E O S + 203 \times 10^{- 8} \cdot C G N C

.

A coefficient of determination (

R^{2}

) of 0.7135 indicates that approximately 71.35% of the variability in the average kilowatt-hour price can be explained by this reduced model. The p-value associated with the model is highly significant (

5.237 \times 10^{- 19}

), suggesting that the model as a whole has statistically relevant predictive capability for the average kilowatt-hour price using the variables identified as important through principal component analysis.

To model the average kilowatt-hour price using the condensed information from the original variables through PCA, a linear regression model was fitted using the first three principal components (PC1, PC2, and PC3), providing a significant proportion of the total data variability. Figure 13 shows the relationship between the actual average kilowatt-hour price values and those predicted by the model based on the principal components. The fitted linear regression model equation is

y = 2.77 + 0.68 \cdot P C 1 - 0.29 \cdot P C 2 + 0.57 \cdot P C 3

.

The coefficient of determination (

R^{2}

) of 0.6504 indicates that approximately 65.04% of the variability in the average kilowatt-hour price can be explained by this model. The associated p-value of the model is highly significant (

3.23 \times 10^{- 18}

), suggesting that the model based on the principal components has statistically relevant predictive capability for the average kilowatt-hour price, capturing an important portion of the variance through these linear combinations of the original variables.

Table 3 presents a summary detailing the different linear models implemented for determining the average unit energy supply price. This table facilitates comparison of the models in terms of their explanatory power (

R^{2}

), statistical significance (p-value), variables included in each model, and their respective equations.

In the highest-performing regression model (

R^{2} \approx 0.9887

), the settlement ratio coefficient is 3.76. This implies that, holding other variables constant, a unit increase in the settlement ratio leads to an increase of 3.76 cents in the average unit cost of energy supply (USD/kWh). Since the settlement ratio is a regulatory factor applied multiplicatively to cost components, such as generation and transmission, its direct impact on the unit price is expected. Thus, the positive coefficient confirms the regulatory transmission of cost burdens to the final tariff.

Similarly, hydropower energy exhibits a negative coefficient (

- 4 \times 10^{- 8}

), meaning that as hydropower generation increases, the average energy cost decreases. This aligns with expectations, as hydroelectric sources are lower-cost and displace more expensive thermal or imported energy.

Some models show negative coefficients for TIES (imported energy), which may seem counterintuitive. However, this could result from multicollinearity with thermal energy and fixed costs, as both tend to rise during crisis periods, masking the true marginal cost of TIES. Additionally, the low frequency of high TIES values (e.g., only during extreme droughts) may cause underestimation of their real impact when using monthly data.

5.4. Autoregressive Integrated Moving Average Model with Exogenous Variables for Cost Forecasting

Prediction of the average unit value of the kilowatt-hour was addressed using an Autoregressive Integrated Moving Average model with exogenous variables (ARIMAX). This time series methodology is used to model and forecast variables that evolve over time, incorporating both the internal dynamics of the series (through its own past values and errors) and the influence of external variables. In this case, the predictions generated by the previously selected linear regression model, based on variables with the highest correlation to the average price, were included as an exogenous variable to enhance the predictive capacity of the ARIMAX model by considering external causal factors.

For the correct application of an ARIMAX model, it is crucial to ensure the stationarity of the time series, which implies that its statistical properties (mean and variance) remain constant over time. The non-stationarity of the original average price series, evidenced in Figure 14, is shown by trends or time-dependent patterns.

A second-order differencing, as shown in Figure 15, was required to induce the stationarity necessary for modeling. Additionally, selecting the order of the autoregressive (AR), integrated (I, corresponding to the number of differencing), and moving average (MA) components of the ARIMAX model, along with proper identification and specification of exogenous variables, are fundamental steps to capture the complexity of the time series and the influence of relevant external factors.

In this study, three linear regression models based on different hypotheses were evaluated to identify the most suitable exogenous variable for the ARIMAX prediction model. The first approach consisted of a regression model using variables selected according to the criteria of industry professionals, incorporating expert knowledge to identify factors potentially influencing the average kilowatt-hour price. The second approach explored regression models employing variables with the highest statistical correlation to the average price, while the third approach is a model based on principal components derived from PCA, aiming to capture the key variability of the data empirically with reduced dimensionality.

Predictions obtained from each of the evaluated linear regression models, representing different strategies for selecting predictor variables, were individually considered as potential exogenous variables for the ARIMAX model. The central hypothesis of this stage was that the inclusion of the prediction from the linear regression model demonstrating the most robust relationship with the average price (whether based on expert knowledge or derived from statistical analysis of correlation and principal components) would enrich the ARIMAX time series model with valuable information, thereby leading to improved prediction accuracy.

5.4.1. ARIMAX Model Based on First Variable Selection

Prediction of the average unit value of the kilowatt-hour was performed using as an exogenous variable the predictions derived from the first variable selection in the linear regression model.

Figure 16 illustrates the time series of actual values and the prediction obtained from the ARIMAX model, along with its confidence interval. The graph shows that the differenced series tends to stabilize around zero over the forecast horizon, despite the high volatility observed in recent data. It is notable that, while the model captures the overall dynamics, the width of the confidence interval increases significantly as the forecast extends into the future, reflecting the inherent uncertainty of long-term prediction.

This linear regression model was established with the following formulation:

y_{t} = 13.5880 \times 10^{- 8} H P - 10.2443 \times 10^{- 8} T I E S + 13.2417 \times 10^{- 8} T E - 5.138 \times 10^{- 2} E O S

.

5.4.2. ARIMAX Model Based on Second Variable Selection

The prediction of the average unit value of the kilowatt-hour was obtained from a time series model that incorporated as an exogenous variable the estimates generated by the second variable selection in the linear regression model.

Figure 17 illustrates the evolution of actual values and the projection for the average kilowatt-hour price. In this representation, the forecasted series tends to converge toward an equilibrium value in the short and medium term, despite the high volatility evidenced in the recent data of the original series. The pink shaded band delimits the confidence interval of the prediction, whose width increases as the forecast horizon extends, reflecting the inherent uncertainty in future estimates.

This prediction is supported by the influence of key exogenous variables incorporated into the model, whose coefficients are derived from the underlying linear regression and are expressed in the following equation:

y_{t} = - 57.5587 \times 10^{- 8} F C R - 543.3333 \times 10^{- 8} V c C R - 1.5060 \times 10^{- 8} C G N C - 1866.658 \times 10^{- 8} C T I E S - 16.244 \times 10^{- 2} C T F T

.

5.4.3. ARIMAX Model Based on Third Variable Selection

The prediction of the average unit value of the kilowatt-hour was derived from a time series model that incorporated as an exogenous variable the projections resulting from the third variable selection in the linear regression model.

Figure 18 displays the evolution of the measured average price values and the obtained future prediction. It can be observed that the forecasted series, despite recent volatility in the historical data, shows a tendency to stabilize over the prediction horizon. The confidence interval accompanying the point forecast progressively widens, reflecting the increase in uncertainty as the projection extends over time.

This prediction is supported by the influence of key exogenous variables incorporated into the model, whose coefficients are derived from the underlying linear regression and are expressed in the following equation:

y_{t} = 17.0082 \times 10^{- 8} H P - 10.4673 \times 10^{- 8} T I E S - 21.7532 \times 10^{- 8} T E + 3.566 C T I E S + 10.064 \times 10^{- 2} S R

.

5.4.4. ARIMAX Model Based on Variables with Highest Correlation

Figure 19 presents the time series of actual values of the average price for the variables with the highest correlation, along with the prediction obtained by the ARIMAX model for the future horizon, accompanied by a confidence interval. The obtained prediction suggests a stabilization trend or slight recovery after an initial decline, oscillating around values close to zero in the average unit value of the kilowatt-hour for the forecasted period, although with increasing uncertainty as the forecast horizon extends, as indicated by the width of the confidence interval.

This prediction is supported by the influence of key exogenous variables incorporated into the model, whose coefficients are derived from the underlying linear regression and are expressed in the following equation:

y_{t} = 24.6184 \times 10^{- 8} H P + 98.9147 \times 10^{- 8} T I E S - 249.1820 \times 10^{- 8} T E - 2198.6252 \times 10^{- 8} F c C R - 54.769 \times 10^{- 8} V c C R + 7.146 C T I E S + 29.13 \times 10^{- 2} S R

.

5.4.5. ARIMAX Model Based on PCA Variables

The prediction of the average unit value of the kilowatt-hour was obtained from a time series model that incorporated as an exogenous variable the estimates generated by the linear regression model based on the variables selected through the most significant PCA variables. Figure 20 shows the historical series of actual values and the projection for the average price. Despite the volatility observed in recent data, the prediction suggests that the series tends to stabilize or show a slight recovery after an initial decline, oscillating around values close to zero over the forecast horizon. The confidence interval accompanying the prediction expands over time, indicative of the increasing uncertainty as the forecast horizon extends.

This prediction is supported by the influence of key exogenous variables incorporated into the model, whose coefficients are derived from the underlying linear regression and are expressed in the following equation:

y_{t} = - 13.479 \times 10^{- 6} T E - 6.4332 \times 10^{- 8} V c C R - 85.445 \times 10^{- 7} D C S U R + 38.9299 \times 10^{- 8} C T F T - 41.96001 \times 10^{- 7} E O S - 50.8526 \times 10^{- 3} C G N C

.

6. Discussion

According to the current regulatory framework governing wholesale electricity transactions in Ecuador (ARCERNNR Regulation 001/23), generation dispatch is executed on an hourly basis, yet commercial energy settlements are carried out monthly. Furthermore, the majority of electricity transactions are conducted through long-term regulated contracts between generation companies and distribution utilities, which are also settled monthly. For this reason, the present study focuses on modeling the monthly average unit cost of energy supply. Although spot market transactions—subject to hourly price volatility—exist in Ecuador, their share is marginal, as most of the energy is allocated via regulated agreements. Additionally, Ecuador has not experienced negative electricity prices, largely due to the low penetration of renewable sources, which currently account for only around 5% of total generation. This contextual specificity justifies the modeling approach based on monthly resolution and reinforces the relevance of variables such as hydropower, imports, and regulatory indices.

The implementation of three regression approaches—expert-driven selection, correlation-based selection, and PCA-based dimensionality reduction—was designed to evaluate trade-offs between interpretability, predictive performance, and robustness. While all models achieved statistically significant results, the correlation-based model outperformed others in terms of explanatory power (

R^{2} \approx 0.9887

), capturing nearly 99% of the variance in average energy price. However, given this near-perfect fit, the possibility of overfitting cannot be dismissed. Although the available dataset is limited in size, no formal out-of-sample validation or cross-validation procedures were conducted. This remains a methodological limitation that future work should address by applying rolling-window or time-series cross-validation to test generalizability under different market conditions.

We also acknowledge the potential for multicollinearity, especially given the extremely high correlation between some predictors. While variance inflation factors (VIFs) were not computed in this version of the study, two strategies were implemented to mitigate this issue: the application of PCA to isolate latent factors and reduce redundancy, and the comparative analysis of model performance metrics—such as adjusted

R^{2}

and statistical significance—to detect collinear effects. Future extensions should include formal diagnostic measures such as VIF and condition indices to further validate the regression models.

Beyond statistical performance, the proposed modeling framework holds practical value for decision-makers in the energy sector. First, the high accuracy of the correlation-based model enables monthly forecasts that can support operational planning, procurement decisions, and budget forecasting for distribution utilities. For example, energy suppliers can anticipate cost increases during drought periods (captured by hydropower reductions and TIES imports) and plan contractual strategies accordingly.

Second, the interpretability of the regression coefficients offers actionable insights for tariff design. The positive and statistically significant coefficient associated with the settlement ratio indicates that regulatory adjustments in cost reallocation have immediate and substantial effects on average prices. This enables regulatory agencies to simulate the impact of policy shifts or price band updates on end-user tariffs in advance.

Third, the high sensitivity of the model to thermal generation and cross-border imports highlights the risks associated with fossil-fuel dependency and limited diversification of the energy matrix. From a long-term planning perspective, the results underscore the importance of promoting renewable investment to reduce exposure to international price shocks and hydrological volatility.

To further contextualize the contribution of this study, it is important to compare the proposed models with existing techniques in the literature. While machine learning approaches such as NARX neural networks [15] and LSTM or XGBoost-based models [20] have achieved strong results in electricity price forecasting, they often require large volumes of high-frequency data and offer limited transparency for policymaking. In contrast, the regression and ARIMAX models developed in this study combine statistical rigor, interpretability, and moderate data requirements, making them well suited for structurally constrained markets such as Ecuador.

Moreover, while GARCH and stochastic volatility model [14] are effective for capturing price volatility in hourly or intraday markets, they are less compatible with the monthly cost settlement logic prevailing in the Ecuadorian system. The ARIMAX model used in this study, fed with exogenous predictors from the best-performing regression, enhances medium-term forecast capability without sacrificing transparency or policy alignment.

In conclusion, the integrated use of multiple linear regression and ARIMAX modeling provides a robust, interpretable, and policy-relevant framework for forecasting energy supply costs in regulated and hydrologically sensitive electricity markets. These findings are valuable not only for Ecuador but also for other emerging economies with similar structural and regulatory characteristics.

7. Conclusions

This study presents a comprehensive modeling framework for estimating and forecasting the average unit cost of energy supply in Ecuador’s electricity distribution system using multiple linear regression and ARIMAX models. The analysis incorporates a diverse set of operational and economic variables, capturing the structural characteristics and regulatory dynamics of the national power market.

Empirical results reveal that the models exhibit strong explanatory power, with the best-performing linear regression achieving an

R^{2} \approx 0.9887

. Among the most influential variables, the settlement ratio, fixed and variable generation costs, and imported energy costs (TIES) demonstrate the strongest positive correlations with energy prices, while hydropower generation exhibits a strong negative association. These findings confirm the price-suppressing role of renewable energy sources and the significant cost pressure imposed by external supply and regulatory redistributions.

The modeling approach further highlights the importance of capturing crisis dynamics. The inclusion of 2023–2024 data, characterized by a major hydrological drought and increased reliance on high-cost imports, allows the models to reflect the volatility and regulatory adjustments faced by the Ecuadorian power system. Such integration strengthens the validity of the framework under both stable and disruptive conditions.

From a policy standpoint, the results underscore the urgent need to enhance system resilience through diversification of the energy mix, more flexible import contracts, and adaptive regulatory mechanisms that mitigate the cost volatility passed onto distribution companies and end users. The models developed here can serve as decision-support tools for tariff setting, risk planning, and energy procurement strategies.

As future work, we recommend the development of high-frequency models using daily or hourly data to better capture short-term volatility and operational shocks. Additionally, further studies could incorporate nonlinear modeling techniques, such as regime-switching models or machine learning algorithms, to explore potential threshold effects and improve predictive performance under extreme conditions. A backtesting exercise targeting specific crisis periods—such as the fourth quarter of 2023—will also be implemented to evaluate out-of-sample robustness.

In conclusion, this research provides a robust, interpretable, and context-sensitive approach to energy price forecasting, contributing both methodologically and practically to energy economics and planning in emerging markets.

Author Contributions

Conceptualization, P.A.M.-S. and L.F.G.-V.; methodology, L.F.G.-V.; validation, P.A.M.-S., L.F.G.-V., N.A.C.-R. and J.O.O.-O.; formal analysis, L.F.G.-V. and N.A.C.-R.; investigation, P.A.M.-S. and L.F.G.-V.; resources, J.O.O.-O. and P.A.C.-P.; data curation, L.F.G.-V., N.A.C.-R. and P.A.C.-P.; writing—original draft preparation, P.A.M.-S., N.A.C.-R. and P.A.C.-P.; writing—review and editing, L.F.G.-V., N.A.C.-R. and J.O.O.-O.; visualization, L.F.G.-V. and N.A.C.-R.; supervision, J.O.O.-O. and P.A.C.-P.; project administration, J.O.O.-O. and P.A.C.-P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

Author Pablo Alejandro Mendez-Santos was employed by the company Empresa Eléctrica Regional Centro Sur. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PCA	Principal Component Analysis
EPF	Electricity Price Forecasting
AR	Autoregressive
ARX	Autoregressive with Exogenous Variables
GARCH	Generalized Autoregressive Conditional Heteroskedasticity
NARX	Nonlinear Autoregressive with Exogenous Variables
MIBEL	Mercado Ibérico de Electricidad
HPI	Hydroelectric Productivity Index
IPI	Industrial Production Index
LMP	Locational Marginal Price
AIC	Akaike Information Criterion
CEEM-DAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
SV	Stochastic Volatility models
LSTM	Long Short-Term Memory
XGBoost	Extreme Gradient Boosting
DNN	Deep Neural Network
LASSO	Least Absolute Shrinkage and Selection Operator
LSTR	Logistic Smooth Transition Regression
ARCONEL	Ecuador’s Electricity Regulation and Control Agency (in Spanish: Agencia de
	Regulación y Control de Electricidad de Ecuador)
TIES	International Electricity Transactions (in Spanish: Transacciones Internacionales
	de Electricidad)
CR	Regulated Cost (in Spanish: Costo Regulado)
GNC	Non-Conventional Generation (in Spanish: Generación No Convencional)
TFT	Transmission Fixed Costs
ANOVA	Analysis of Variance
PC	Principal Component
CENTROSUR	South Central Regional Electric Company (in Spanish: Empresa eléctrica
	Regional Centro Sur)
HP	Hydroelectric Power
TE	Thermal Energy
EOS	Energy from Other Sources
FcCR	Fixed Cost CR
VcCR	Variable Cost CR
CGNC	GNC Cost
CTIES	Cost TIES
CTFT	Cost TFT
SR	Settlement ratio
DCSUR	Demand
VIF	Variance Inflation factor
CSUR	South Center

Appendix A

Electricity generation is the initial activity that enables energy production, which can be carried out using various primary sources such as renewable resources (water flow or head, wind, solar, biogas, etc.) or fossil fuels (natural gas, diesel, fuel oil, etc.) [26]. The energy production process largely depends on the generation dispatch performed by the System Operator, who determines the units and quantities of energy as part of the operational cost optimization process, in addition to monitoring the security and reliability of supply to the load.

Electricity transmission activity follows generation and consists of transporting energy over long distances, from production sites to distribution network substations to energize various loads and meet energy demand. The transmission stage involves the operation, administration, and management of all infrastructure required for energy transport, along with the associated costs, ensuring that electricity reaches regional distribution centers efficiently [26].

The electricity distribution stage constitutes the final phase in delivering energy to consumption points. This essential activity encompasses the comprehensive operation and management of network infrastructure, ensuring a safe and reliable electricity supply to all loads. A critical aspect of distribution is maintaining service quality and continuity parameters within the limits established by regulatory standards [27].

Within the Ecuadorian regulatory framework, electricity distribution companies are the entities responsible for ensuring supply to end users. To fulfill this obligation, they must procure energy in the wholesale market through regulated contracts signed with both public and private generation companies [28].

Consequently, the cost of energy transacted at the wholesale level by distribution companies is primarily composed of charges related to generation and transmission [25]. The cost structure of generation includes, among other elements, the return on invested capital and variable costs, the latter being directly proportional to the volume of electricity produced.

In the Ecuadorian wholesale electricity market, energy prices are directly influenced by the investment and operating costs associated with the various generation sources [29], with a particularly significant impact from hydropower and thermal technologies due to their predominance in the country’s generation matrix. This pricing structure, defined by current regulations governing wholesale commercial transactions, takes into account the inherent costs of each generation technology.

Specifically, within the generation cost component and in accordance with the current tariff structure [25], the following cost elements are considered: operation and maintenance of service assets, asset annuity (capital cost), environmental responsibility, and administrative expenses.

Another cost that directly affects the price is associated with generation from non-conventional renewable sources, such as energy produced from photovoltaic, wind, or biomass sources. In Ecuador, this type of generation still plays a marginal role in meeting national demand [22], and its commercial settlement is based on prices negotiated in concession contracts granted by the Ecuadorian state or on preferential conditions established in specific regulations that promote investment in such technologies.

Figure A1. Histograms and Q–Q plots for normality analysis of the variables: (a) TIES, (b) Thermal energy, (c) Energy from other sources.

Table A1. Variables of the principal component analysis.

	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8	PC9	PC10
Demand (kWh/month)	0.05	−0.62	0.12	−0.02	−0.23	−0.50	0.06	0.29	−0.07	−0.16
Hydroelectric Power (kWh/month)	0.35	−0.14	0.08	0.58	0.23	−0.04	−0.25	0.03	−0.59	0.11
TIES (kWh/month)	−0.37	−0.32	−0.01	−0.15	0.03	−0.37	−0.04	0.01	0.09	0.27
Thermal Energy (kWh/month)	0.41	0.02	0.16	−0.09	−0.28	0.13	0.20	0.27	0.06	−0.58
Energy from other sources (kWh)	0.17	−0.15	−0.65	0.31	−0.50	−0.03	0.12	−0.38	0.13	0.06
Fixed cost CR (USD/month)	0.33	−0.01	−0.20	−0.52	−0.16	−0.01	−0.74	0.02	−0.08	0.08
Variable cost CR (USD/month)	0.41	−0.02	0.16	−0.09	−0.18	0.16	0.31	0.31	0.12	0.73
Cost GNC (USD/month)	−0.14	−0.38	−0.53	−0.13	0.32	0.48	0.12	0.41	−0.14	−0.06
Cost TIES (USD/month)	0.37	−0.15	−0.06	0.18	0.53	−0.15	−0.14	−0.01	0.69	−0.06
Cost TFT (USD/month)	0.17	−0.48	0.29	−0.29	0.14	0.29	0.17	−0.65	−0.11	−0.02
Settlement ratio	0.28	0.27	−0.32	−0.35	0.34	−0.48	0.42	−0.09	−0.31	−0.03

Figure A2. Histograms and Q–Q plots for normality analysis of the variables: (a) Fixed cost CR, (b) Variable cost CR, (c) GNC cost.

Figure A3. Histograms and Q–Q plots for normality analysis of the variables: (a) Cost TIES, (b) Cost TFT, (c) Settlement ratio.

References

Contreras, J.; Espínola, R.; Nogales, F.J.; Conejo, A.J. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 2003, 18, 1014–1020. [Google Scholar] [CrossRef]
Aggarwal, S.K.; Saini, L.M.; Kumar, A. Electricity price forecasting in deregulated markets: A review and evaluation. Int. J. Electr. Power Energy Syst. 2009, 31, 13–22. [Google Scholar] [CrossRef]
Ugurlu, U.; Oksuz, I.; Taş, O. Electricity Price Forecasting Using Recurrent Neural Networks. Energies 2018, 11, 1255. [Google Scholar] [CrossRef]
Panapakidis, I.; Dagoumas, A. Day-ahead electricity price forecasting via the application of artificial neural network based models. Appl. Energy 2016, 172, 132–151. [Google Scholar] [CrossRef]
Tschora, L.; Pierre, E.; Plantevit, M.; Robardet, C. Electricity price forecasting on the day-ahead market using machine learning. Appl. Energy 2022, 313, 118752. [Google Scholar] [CrossRef]
Banitalebi, B.; Hoque, M.E.; Appadoo, S.S.; Thavaneswaran, A. Regularized Probabilistic Forecasting of Electricity Wholesale Price and Demand. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020. [Google Scholar] [CrossRef]
Jędrzejewski, A.; Marcjasz, G.; Weron, R. Importance of the Long-Term Seasonal Component in Day-Ahead Electricity Price Forecasting Revisited: Parameter-Rich Models Estimated via the LASSO. Energies 2021, 14, 3249. [Google Scholar] [CrossRef]
Uniejewski, B.; Weron, R. Regularized quantile regression averaging for probabilistic electricity price forecasting. Energy Econ. 2021, 95, 105121. [Google Scholar] [CrossRef]
Bissing, D.; Klein, M.T.; Chinnathambi, R.A.; Selvaraj, D.F.; Ranganathan, P. A Hybrid Regression Model for Day-Ahead Energy Price Forecasting. IEEE Access 2019, 7, 36833–36842. [Google Scholar] [CrossRef]
Alkawaz, A.N.; Abdellatif, A.; Kanesan, J.; Khairuddin, A.S.M.; Gheni, H.M. Day-Ahead Electricity Price Forecasting Based on Hybrid Regression Model. IEEE Access 2022, 10, 108021–108033. [Google Scholar] [CrossRef]
Dragasevic, Z.; Milovic, N.; Djurisic, V.; Backovic, T. Analyzing the factors influencing the formation of the price of electricity in the deregulated markets of developing countries. Energy Rep. 2021, 7, 937–949. [Google Scholar] [CrossRef]
U.S. Energy Information Administration. Ecuador Has Continued to Expand Use of Hydroelectric Power. Today in Energy—EIA; 21 September 2023. Available online: https://www.eia.gov/todayinenergy/detail.php?id=60442 (accessed on 2 May 2025).
International Trade Administration. Ecuador—Electric Power and Renewable Energy. Country Commercial Guide; 8 February 2024. Available online: https://www.trade.gov/country-commercial-guides/ecuador-electric-power-and-renewable-energy (accessed on 2 May 2025).
Weron, R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. Int. J. Forecast. 2014, 30, 1030–1081. [Google Scholar] [CrossRef]
Agudelo, A.P.; López-Lezama, J.M.; Velilla, E. Predicción del precio de la electricidad en la bolsa mediante un modelo neuronal no-lineal autorregresivo con entradas exógenas. Inf. Tecnol. 2015, 26, 99–108. [Google Scholar] [CrossRef]
Ramos, J.; Ferreira, Â.P.; Fernandes, P.O. Structural analysis, modeling and forecasting o electricity prices of the iberian electricity market. In Proceedings of the I Ibero-American Congress of Smart Cities (ICSC-CITIES 2018), Soria, Spain, 26–27 September 2018; Universidad Santiago de Cali Publicaciones: Santiago de Cali, Colombia, 2018; pp. 452–464. [Google Scholar]
Zheng, K.; Wang, Y.; Liu, K.; Chen, Q. Locational marginal price forecasting: A componential and ensemble approach. IEEE Trans. Smart Grid 2020, 11, 4555–4564. [Google Scholar] [CrossRef]
Ulgen, T.; Poyrazoglu, G. Predictor analysis for electricity price forecasting by multiple linear regression. In Proceedings of the 2020 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM), Sorrento, Italy, 24–26 June 2020; pp. 618–622. [Google Scholar]
Zhang, T.; Zhang, J.; Liu, Y.; Pan, S.; Sun, D.; Zhao, C. Design of Linear Regression Scheme in Real-Time Market Load Prediction for Power Market Participants. In Proceedings of the 2021 11th International Conference on Power and Energy Systems (ICPES), Shanghai, China, 18–20 December 2021; pp. 547–551. [Google Scholar]
Kapoor, G.; Wichitaksorn, N. Electricity price forecasting in New Zealand: A comparative analysis of statistical and machine learning models with feature selection. Appl. Energy 2023, 347, 121446. [Google Scholar] [CrossRef]
Tröndle, T.; Lilliestam, J.; Marelli, S.; Pfenninger, S. Trade-offs between geographic scale, cost, and infrastructure requirements for fully renewable electricity in Europe. Joule 2020, 4, 1929–1948. [Google Scholar] [CrossRef] [PubMed]
Vaca-Jiménez, S.; Gerbens-Leenes, P.; Nonhebel, S. Water-electricity nexus in Ecuador: The dynamics of the electricity’s blue water footprint. Sci. Total Environ. 2019, 696, 133959. [Google Scholar] [CrossRef]
Ochoa, C.; van Ackere, A. Does size matter? Simulating electricity market coupling between Colombia and Ecuador. Renew. Sustain. Energy Rev. 2015, 50, 1108–1124. [Google Scholar] [CrossRef]
Godoy, J.C.; Cajo, R.; Estrada, L.M.; Hamacher, T. Multi-criteria analysis for energy planning in Ecuador: Enhancing decision-making through comprehensive evaluation. Renew. Energy 2025, 241, 122278. [Google Scholar] [CrossRef]
Garcés, L.M. Reglamento a Ley Orgánica del Servicio Público de Energía Eléctrica. 2019. Available online: https://www.recursosyenergia.gob.ec/wp-content/uploads/2023/02/LEY-ORGANICA-SERVICIOPUBLICO-ENERGIA-ELECTRICA.pdf (accessed on 2 May 2025).
Salvi, V.Z. Electricity Supply Chain Management—A Literature Review. Arch. Bus. Res. 2020, 8, 182–191. [Google Scholar]
Delavechia, R.P.; Ferraz, B.P.; Weiand, R.S.; Silveira, L.; Ramos, M.J.S.; dos Santos, L.L.C.; Bernardon, D.P.; Garcia, R.A.F. Electricity Supply Regulations in South America: A Review of Regulatory Aspects. Energies 2023, 16, 915. [Google Scholar] [CrossRef]
Alexander, A.C.K.; Andrés, C.Y.Ó.; Benjamín, S.Y.G. A Proposed Methodology for the Calculation of Variable Production Costs in Hydropower Plants in Ecuador. In Proceedings of the 2023 IEEE Seventh Ecuador Technical Chapters Meeting (ECTM), Ambato, Ecuador, 10–13 October 2023; pp. 1–6. [Google Scholar]
Naranjo-Silva, S.; Punina-Guerrero, D.; Rivera-Gonzalez, L.; Escobar-Segovia, K.; Barros-Enriquez, J.D.; Almeida-Dominguez, J.A.; Alvarez del Castillo, J. Hydropower scenarios in the face of climate change in Ecuador. Sustainability 2023, 15, 10160. [Google Scholar] [CrossRef]

Figure 1. Comparative monthly evolution of electricity demand in the South Central Regional Electric Company (CENTROSUR) (in kWh/month) and the average unit energy supply price (USD cents/kWh) during the 2018–2024 period.

Figure 2. Monthly evolution of the settlement ratio and the average unit energy supply price (USD cents/kWh) between 2018 and 2024.

Figure 3. Histograms and Q–Q plots for the normality analysis of the variables: (a) average price, (b) Demand, (c) Hydroelectric Power.

Figure 4. Boxplot showing the average price (USD cents/kWh) over the range of years (2018–2024). Boxes represent the interquartile range (IQR), the orange line indicates the median, and the whiskers extend to 1.5 times the IQR. The rhombus symbols denote outliers beyond this range.

Figure 5. Dunn’s test heat map with adjusted p-values between the years 2018 and 2024).

Figure 6. Heatmap of dependent and independent variables.

Figure 7. Cumulative explained variance as a function of the number of principal components. The red dashed line represents the 95% threshold commonly used to retain sufficient variance in dimensionality reduction.

Figure 8. Linear regression model using variables of Hydroelectric Power (kWh/month), TIES (kWh/month), Thermal Energy (kWh/month), and Energy from other sources (kWh/month).

Figure 9. Linear regression model using variables of Fixed cost CR (USD/month) (FcCR), Variable cost CR (USD/month) (VcCR), GNC cost (USD/month) (CGNC), Cost TIES (USD/month) (CTIES), and Cost TFT (USD/month) (CTFT).

Figure 10. Linear regression model using variables of Hydroelectric Power (HP), TIES (TIES), Thermal Energy (TE), Cost TIES (CTIES), and the Settlement ratio (SR).

Figure 11. Linear regression model using the highest correlation analysis with variables Hydroelectric Power (kWh/month) (HP), TIES (TIES), Thermal Energy (kWh/month) (TE), Fixed cost CR (USD/month) (FcCR), Variable cost CR (USD/month) (VcCR), Cost TIES (USD/month) (CTIES), and the Settlement ratio (SR).

Figure 12. Linear regression model using principal component analysis with variables Demand (kWh/month) (DCSUR), Thermal Energy (kWh/month) (TE), Energy from other sources (kWh) (EOS), Variable cost CR (USD/month) (VcCR), GNC cost (USD/month) (CGNC), and Cost TFT (USD/month) (CTFT).

Figure 13. Linear regression model based on the first three principal components (PC1, PC2, and PC3).

Figure 14. Original time series of the average kilowatt-hour price, showing a trend suggesting non-stationarity.

Figure 15. Time series of the average kilowatt-hour price differenced twice to achieve stationarity.

Figure 16. Prediction of the average kilowatt-hour price using an ARIMAX model incorporating variables from the first hypothesis of the linear regression model. The blue line represents measured values, while the orange line indicates forecasted values. The shaded area around the forecast represents the 95% confidence interval.

Figure 17. Prediction of the average kilowatt-hour price using an ARIMAX model incorporating variables from the second hypothesis of the linear regression model. The blue line represents measured values, while the orange line indicates forecasted values. The shaded area around the forecast represents the 95% confidence interval.

Figure 18. Prediction of the average kilowatt-hour price using an ARIMAX model incorporating variables from the third hypothesis of the linear regression model. The blue line represents measured values, while the orange line indicates forecasted values. The shaded area around the forecast represents the 95% confidence interval.

Figure 19. Prediction of the average kilowatt-hour price using an ARIMAX model incorporating variables with the highest correlation from the linear regression model. The blue line represents measured values, while the orange line indicates forecasted values. The shaded area around the forecast represents the 95% confidence interval.

Figure 20. Prediction of the average kilowatt-hour price using an ARIMAX model incorporating variables with the highest correlation from the linear regression model. The blue line represents measured values, while the orange line indicates forecasted values. The shaded area around the forecast represents the 95% confidence interval.

Table 1. Summary of related work on energy price modeling and forecasting methods.

Ref.	Approach	Methodology	Results
[14]	Comprehensive review of electricity price forecasting (EPF) methodologies.	Systematic review of forecasting models: from linear methods (AR, ARX) to nonlinear approaches (GARCH, neural networks, etc.).	Provides a comprehensive overview of existing forecasting approaches and suggests directions for future research.
[15]	Energy price forecasting in the Colombian market using neural network techniques.	Nonlinear autoregressive neural network model with exogenous inputs (NARX), incorporating variables such as hydrology, demand, and El Niño phenomenon.	The NARX model achieved good predictive fit, maintaining accuracy even outside the training period (robust generalization).
[16]	Structural modeling of monthly electricity prices in the MIBEL market (Spain/Portugal) using economic and climatic variables.	Multiple linear regression with exogenous variables (e.g., economic indices HPI/IPI and heating/cooling degree days) to explain monthly price variation.	The model explained approximately 53% of price variability in Portugal and 29% in Spain; it identified that economic indicators (HPI, IPI) negatively influence prices.
[17]	Locational marginal price (LMP) forecasting by decomposing the problem into price subcomponents.	Decomposition of LMP into components (energy, congestion, losses) and use of separate models for each component, combined into an ensemble forecast.	The strategy of decomposing the price into components and forecasting them individually resulted in more accurate and robust LMP predictions.
[18]	Analysis of key predictors for energy price forecasting in the Turkish market.	Multiple linear regression incorporating lagged prices and exogenous variables (fuel prices such as gas, oil, coal) in a dynamic framework.	Including past prices and fuel variables improved forecasting accuracy; these predictors proved significant for estimating electricity prices.
[19]	Real-time electricity load forecasting using an optimized linear regression scheme.	Simplified multiple linear regression, optimized using statistical criteria (AIC) and influence analysis (Cook’s distance) to improve robustness.	Despite its simplicity, the linear model achieved effective load predictions using only public data, demonstrating the usefulness of a parsimonious approach.
[20]	Electricity price forecasting in New Zealand comparing traditional statistical models with machine learning techniques.	Model comparison: includes statistical methods (GARCH, stochastic volatility) and machine learning techniques (LSTM, XGBoost, DNN), using LASSO for relevant variable selection.	In the analyzed case, statistical models (GARCH, SV) supported by LASSO-based variable selection outperformed more complex machine learning models in forecasting accuracy.

Table 2. Variables of the principal component analysis.

	PC1	PC2	PC3
Demand (kWh/month)	0.2829	0.5500	−0.3011
Thermal Energy (kWh/month)	0.5804	−0.2297	0.1432
Energy from other sources (kWh)	0.1865	0.2839	0.8025
Variable cost CR (USD/month)	0.5859	−0.1925	0.1178
Cost GNC (USD/month)	−0.1657	0.6409	0.2176
Cost TFT (USD/month)	0.4215	0.3411	−0.4284

Table 3. Linear models implemented for determining the average unit energy supply price.

Model	$R^{2}$	p-Value	Variables	Equation
First variable selection	0.7035	$4.0 \times 10^{- 20}$	Hydroelectric power, TIES, Thermal energy, and Energy from other sources	$C = 7.09 - 7 \times 10^{- 8} \cdot H P + 4 \times 10^{- 8} \cdot T I E S + 7 \times 10^{- 8} \cdot T E + 28 \times 10^{- 8} \cdot E O S$
Second variable selection	0.8961	$7.2 \times 10^{- 37}$	Fixed cost, Variable cost, GNC cost, Cost TIES, and Cost TFT	$C = - 0.07 + 236 \times 10^{- 8} \cdot F c C R + 100 \times 10^{- 8} \cdot V c C R - 30 \times 10^{- 8} \cdot C G N C + 115 \times 10^{- 8} \cdot C T I E S - 373 \times 10^{- 8} \cdot C T F T$
Third variable selection	0.9654	$4.3 \times 10^{- 54}$	Hydroelectric power, TIES, Thermal energy, Cost TIES, and Settlement ratio	$C = 2.29 - 3 \times 10^{- 8} \cdot H P - 9 \times 10^{- 8} \cdot T I E S + 2 \times 10^{- 8} \cdot T E + 4 \times 10^{- 8} \cdot E O S + 120 \times 10^{- 8} \cdot C T I E S + 4.34 \cdot S R$
Highest correlation	0.9887	$2.8 \times 10^{- 71}$	Hydroelectric power, TIES, Thermal energy, Fixed cost, Variable cost, Cost TIES, and Settlement ratio	$C = 0.81 - 4 \times 10^{- 8} \cdot H P - 5 \times 10^{- 8} \cdot T I E S - 6 \times 10^{- 8} \cdot T E + 153 \times 10^{- 8} \cdot F c C R + 112 \times 10^{- 8} \cdot V c C R + 98 \times 10^{- 8} \cdot C T I E S + 3.76 \cdot S R$ .
Principal component variables	0.7135	$5.2 \times 10^{- 19}$	CSUR Demand, Thermal energy, Energy from other sources, Variable cost, GNC cost, and Cost TFT	$C = 6.19 + 1 \times 10^{- 8} \cdot E T + 330 \times 10^{- 8} \cdot V c C R - 9 \times 10^{- 8} \cdot D C S U R - 175 \times 10^{- 8} \cdot C T F T + 40 \times 10^{- 8} \cdot E O S + 203 \times 10^{- 8} \cdot C G N C$ .
Principal components	0.6504	$3.23 \times 10^{- 18}$	PC1, PC2, and PC3	$y = 2.77 + 0.68 \cdot P C 1 - 0.29 \cdot P C 2 + 0.57 \cdot P C 3$ .

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mendez-Santos, P.A.; Chacón-Reino, N.A.; Guerrero-Vásquez, L.F.; Ordoñez-Ordoñez, J.O.; Chasi-Pesantez, P.A. Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador. Energies 2025, 18, 3659. https://doi.org/10.3390/en18143659

AMA Style

Mendez-Santos PA, Chacón-Reino NA, Guerrero-Vásquez LF, Ordoñez-Ordoñez JO, Chasi-Pesantez PA. Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador. Energies. 2025; 18(14):3659. https://doi.org/10.3390/en18143659

Chicago/Turabian Style

Mendez-Santos, Pablo Alejandro, Nathalia Alexandra Chacón-Reino, Luis Fernando Guerrero-Vásquez, Jorge Osmani Ordoñez-Ordoñez, and Paul Andrés Chasi-Pesantez. 2025. "Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador" Energies 18, no. 14: 3659. https://doi.org/10.3390/en18143659

APA Style

Mendez-Santos, P. A., Chacón-Reino, N. A., Guerrero-Vásquez, L. F., Ordoñez-Ordoñez, J. O., & Chasi-Pesantez, P. A. (2025). Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador. Energies, 18(14), 3659. https://doi.org/10.3390/en18143659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation and Forecasting of the Average Unit Cost of Energy Supply in a Distribution System Using Multiple Linear Regression and ARIMAX Modeling in Ecuador

Abstract

1. Introduction

2. Related Works

3. Energy Supply in the Ecuadorian Electricity Sector

4. Methodology

4.1. Analysis Variables

4.2. Normality Analysis of the Variables

4.3. Analysis of Significant Differences

4.4. Correlation Analysis

4.5. Principal Component Analysis

5. Results

5.1. Regression Model Using Variables Selected According to Professional Criteria

5.2. Regression Models Using Variables with Highest Correlation

5.3. Regression Models Using Principal Components

5.4. Autoregressive Integrated Moving Average Model with Exogenous Variables for Cost Forecasting

5.4.1. ARIMAX Model Based on First Variable Selection

5.4.2. ARIMAX Model Based on Second Variable Selection

5.4.3. ARIMAX Model Based on Third Variable Selection

5.4.4. ARIMAX Model Based on Variables with Highest Correlation

5.4.5. ARIMAX Model Based on PCA Variables

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI