Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning

León, Ricardo; Ramírez, Guillermo; Cifuentes, Camilo; Vergara, Samuel; Aedo-García, Roberto; Lanyon, Francisco Ramis; Martin, Rodrigo J. Villalobos San

doi:10.3390/app16031318

Open AccessArticle

Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning

by

Ricardo León

^1,2,*

,

Guillermo Ramírez

^1,2

,

Camilo Cifuentes

¹,

Samuel Vergara

^1,2

,

Roberto Aedo-García

^3,*

,

Francisco Ramis Lanyon

⁴ and

Rodrigo J. Villalobos San Martin

⁵

¹

Departamento de Ingeniería Eléctrica, Universidad Católica de la Santísima Concepción, Alonso de Ribera 2850, Concepción 4090541, Chile

²

Centro de Energía, Universidad Católica de la Santísima Concepción, Alonso de Ribera 2850, Concepción 4090541, Chile

³

Department of Physics, School of Science, Universidad del Bío-Bío, Av. Collao 1202, Concepción 4051381, Chile

⁴

Department of Industrial Engineering, School of Engineering, Universidad del Bío-Bío, Av. Collao 1202, Concepción 4051381, Chile

⁵

Departamento de Ingeniería Eléctrica, Facultad de Ingeniería y Ciencias, Universidad de La Frontera, Temuco 5110500, Chile

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1318; https://doi.org/10.3390/app16031318

Submission received: 17 December 2025 / Revised: 21 January 2026 / Accepted: 23 January 2026 / Published: 28 January 2026

(This article belongs to the Special Issue New Trends in Renewable Energy and Power Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study proposes and evaluates a data-driven framework for short-term System Marginal Price (SMP) forecasting in the Chilean National Electric System (NES), a power system characterized by high penetration of variable renewable generation and persistent transmission congestion. Using publicly available hourly operational data for 2024, multiple machine learning regressors including Linear Regression (base case), Bayesian Ridge, Automatic Relevance Determination, Decision Trees, Random Forests, and Support Vector Regression are implemented under a node-specific modeling strategy. Two alternative approaches for predictor selection are compared: a system-wide methodology that exploits lagged SMP information from all network nodes; and a spatially filtered methodology that restricts SMP inputs to correlated subsystems identified through nodal correlation analysis. Model robustness is explicitly assessed by reserving January and July as out-of-sample test periods, capturing contrasting summer and winter operating conditions. Forecasting performance is analyzed for representative nodes located in the northern, central, and southern zones of the NES, which exhibit markedly different congestion levels and generation mixes. Results indicate that non-linear and ensemble models, particularly Random Forest and Support Vector Regression, provide the most accurate forecasts in well-connected areas, achieving mean absolute errors close to 10 USD/MWh. In contrast, forecast errors increase substantially in highly congested southern zones, reflecting the structural influence of transmission constraints on price formation. While average performance differences between M1 and M2 are modest, a paired Wilcoxon signed-rank test reveals statistically significant improvements with M2 in highly congested zones, where M2 yields lower absolute errors for most models, despite relying on fewer inputs. These findings highlight the importance of congestion-aware feature selection for reliable price forecasting in renewable-intensive systems.

Keywords:

system marginal price; forecast; variable renewable generation; artificial intelligence; electric energy price forecast; machine learning models

1. Introduction

Electricity serves as a foundational driver of economic development, social stability, and national competitiveness. Precise forecasting of the price of electricity, also known as the System Marginal Price (SMP), is a critical issue for market participants and regulatory authorities [1,2,3], as it provides a crucial signal for both short-term market operations and long-term investment planning. System operators focus on minimizing costs and maintaining reliability, while market participants seek to anticipate SMP trends to optimize bidding strategies and enhance revenue streams. In liberalized and increasingly competitive power markets, forecasting accuracy is vital for operational planning, strategic decision-making, and robust risk management [4,5].

Traditionally, SMP has been determined through cost-based economic dispatch, assuming stable and predictable demand. The global energy transition marked by rapid integration of renewable energy sources and the ongoing decarbonization of power systems has introduced non-linearity, intermittency, and heightened uncertainty into SMP dynamics, thereby increasing the complexity of the forecasting process [6,7,8]. Empirical studies have identified numerous interconnected factors influencing price dynamics, including fossil fuel prices, hydroelectric capacity, market structure, transmission limitations, extreme weather events, and geopolitical disruptions [9,10].

In Chile, the National Electric System (NES) operates within a liberalized and decentralized framework comprising regulated segments (transmission and distribution) and unregulated segments (generation and commercialization), all overseen by the National Electricity Coordinator (NEC). The NES supplies electricity to approximately 98.5% of the Chilean population and extends over 3100 km. By March 2025, the national transmission network covered 39,571.87 km, and the net installed generation capacity of the system is 37.39 GW [11,12]. The main nodes of the NES and its geographical shape are shown in Figure 1.

The NES has seen rapid changes in the last few years, with more than 30% of the installed capacity now sourced from Non-Conventional Renewable Energy (NCRE), primarily solar photovoltaics and wind power [13,14,15]. This progress is the result of public policies designed to promote sustainable development in the power sector [16]. By April 2025, variable renewable generation (VRG), including solar and wind, reached an installed capacity of 18.98 GW (around half of the total installed capacity). The country aims to meet 90% of its electricity demand with clean energy sources by 2050 [17].

This transition has introduced significant operational and economic challenges. In particular, the complexity of the SMP forecasting in the NES is driven by its main characteristics, including its long topology, different generation and demand mixes per zone, increasing transmission bottlenecks, and lack of connection with other major grids. All these characteristics are similar to those reported in [2,4,5,6,18,19,20,21]. One growing issue in the NES is transmission bottlenecks, shown in Figure 2 for the year 2024 for the main nodes of the NES, from north to south [12]. The hours of congestion defined in [12] are estimated as the total number of hours in which SMP values from a busbar vary from those of other bars by 7% or more. Table 1 shows the total SMP decoupling for January and July of 2024, including the average variation in SMP values.

The results of the transmission constraints shown in Figure 2 are an increase in hours of SMP decoupling, as shown in Table 1, and the presence of several hours during the day with values of 0/MWh, shown as an example in Figure 3 for 1 March 2024.

As the share of VRG has increased in the NES over the past few years, so has its curtailment due to transmission constraints and the surplus of generation, particularly during the daytime. A report by the Association of Generators [12] estimated that, by January 2025, the accumulated energy curtailment would reach an annual value of 614.4 GWh, up from 18% when compared to the same month last year.

In this paper, we address the challenge of developing and evaluating a data-driven SMP forecasting framework tailored to a renewable-intensive electricity market with high transmission decoupling levels. This study uses historical datasets from the NEC, including hourly time series of the instantaneous SMP and additional exogenous variables which include electricity generation disaggregated by technology type (NCRE, hydro and thermal) and the hourly net demand profile of the system, as shown for a day in Figure 4. The key novelty of this work lies in the analysis of the impact of SMP forecasting with high levels of decoupling due to transmission constraints, which compares two methodologies for selecting predictor variables: M1, using information from all NES nodes; and M2, using subsystem-specific variables derived from SMP clustering.

1.1. State of the Art

A substantial body of literature has examined a wide range of methodologies for forecasting electricity prices [2,4,18,19,21,22,23,24,25]. Forecasting methods are generally classified by horizon into short-term, medium-term, and long-term approaches [19], spanning intra-day predictions to multiyear strategic projections. Another common classification differentiates between market simulation-based approaches and data-driven models [2,26]. Market simulation-based methods employ structural tools such as unit commitment and economic dispatch, whereas data-driven models infer price behavior from historical data without explicit system optimization. Further, system dynamics modeling by Zhao et al. [27] demonstrated that policy can reshape market behavior, influence the long-term supply mix, and modify the trajectory of renewable energy deployment.

Recently, data-driven strategies, particularly those utilizing artificial intelligence (AI), have become increasingly prominent [1,2,4,18,19,21,22,23,24,25,26,28,29,30,31]. These approaches encompass artificial neural networks (ANN), recurrent neural networks (RNN), ARIMA models, and hybrid statistical–machine learning (ML) systems [2,4,18,19,20,21]. ML-based frameworks offer several advantages, including the ability to process large-scale, multivariate inputs, capture latent data interactions, and autonomously identify relevant features even in the presence of multicollinearity [2,20,21,26]. Long short-term memory (LSTM) and related architectures are especially effective at modeling temporal dependencies and volatility clustering, and have outperformed classical methods in case studies from Australia [23], Belgium [1], and Turkey [25], among other markets.

Hybrid and ensemble techniques have been shown to further enhance forecasting accuracy [2,4,5,6,18,19,20,21]. For instance, Yang et al. [20] combined backpropagation neural networks with adaptive neuro-fuzzy inference systems (ANFIS) to jointly model linear and non-linear dynamics. Alshater et al. [2] integrated machine learning and regression techniques to address energy equity price modeling under high uncertainty. Additionally, Billé et al. [4] employed ARFIMA-GARCH structures, highlighting the advantages of incorporating time-varying volatility.

SMP dynamics are strongly influenced by grid topology and transmission constraints, particularly in nodal or congestion-prone markets [19,23]. Seasonal patterns related to demand cycles and renewable resource availability are also shown to significantly affect forecast accuracy, necessitating explicit temporal feature engineering [5,19]. The impact of VRG is twofold; while renewable generation reduces marginal costs and increases the occurrence of low-price regimes, it also amplifies volatility and uncertainty due to intermittency [22,24]. ML models that fail to incorporate renewable forecasts or grid operational variables tend to exhibit degraded performance under high-renewable-penetration scenarios [7,10]. Hybrid and ensemble approaches have been proposed to mitigate these issues by combining models optimized for different regimes [7,9,30].

A primary challenge in electricity price forecasting for systems such as Chile’s National Electric System (NES) is the influence of transmission constraints. Losses and physical congestion reduce dispatch efficiency, creating substantial price disparities across network nodes. Limited transfer capability causes the System Marginal Price (SMP) to become spatially decoupled, reflecting localized operating conditions. Although some authors characterize this decoupling as a stochastic phenomenon, this assumption can be considered valid in well-interconnected meshed systems [1,21]. However, in systems like the NES, with no major interconnection and extremely radial topology, where renewable generation has expanded more rapidly than transmission infrastructure, congestion has become structural and persistent.

All the conditions mentioned above require specialized approaches, including data segmentation, non-linear transformations, and congestion-aware feature engineering [18]. For instance, Díaz et al. [8] demonstrated that, in the Spanish power system, network topology introduces non-linearities that diminish model performance if spatial information is disregarded. Volatility in SMP caused by grid constraints introduces non-linear dynamics, spatial heterogeneity, and rare events, all of which present significant challenges for machine learning models. Zheng et al. [27] proposed a framework based on a component and group that decompose LMPs into interpretable subcomponents, allowing improved nodal price prediction under congestion conditions. Similar nodal and reduced-spatial-input strategies are explored in [3], where spatial filtering techniques are employed to limit dimensionality while retaining relevant network signals.

To mitigate the effect of transmission constrains, researchers recommend integrating congestion indicators, rare-event detection methods, and hybrid physical and data-driven modeling approaches [2,4,5,6,18,19,20,21].

Additionally, the authors in [28] find that forecast performance can differ substantially between regions within the same system, underscoring the importance of spatially coherent subsystems.

Recent advances in deep learning have highlighted attention-based mechanisms as a way to capture long-range temporal dependencies and regime shifts commonly observed in electricity markets, particularly under renewable integration. In electricity price forecasting, Laitsos et al. [10] proposed short-term pipelines that incorporate attention within deep models (e.g., CNN and CNN–GRU variants), reporting competitive precision and emphasizing attention for extracting informative temporal patterns [32]. In parallel, graph-based Transformer models have been adopted to represent spatial dependencies among nodes; for example, Zhu et al. [33] introduced a spatio-temporal dynamic graph Transformer for short-term load forecasting by combining graph learning/convolution with multiscale self-attention to model dynamic inter-node relationships and temporal structure [33]. These developments motivate congestion and spatial awareness in data-driven forecasting. In this context, our study keeps the modeling backbone intentionally lightweight and interpretable by comparing two SMP input selection strategies, M1 (system-wide lagged SMP input) and M2 (correlation-based spatial filtering of SMP inputs), thus isolating the impact of spatial information selection on forecast performance in a transmission-constrained, renewable-intensive system.

1.2. Novel Contributions and Structure of This Work

The main contribution of this work is an evaluation of several data-driven ML methods for forecasting marginal prices in the NES using public data from the Chilean system operator (NEC). ML models were trained using two methodologies for the selection of predictor variables: M1, using information from all NES nodes; and M2, using subsystem-specific variables derived from SMP clustering. The SMP forecasting is particularly challenging because the Chilean NES has unique characteristics: a long geographic extension (over 3100 km), absence of international interconnections, rapid VRG growth, and delayed transmission expansion which has led to transmission constraints.

The findings highlight the different performances of the ML methods for SMP forecasting when influenced by physical characteristics of the power system, seasonal variability, and network constraints, as well as the influence of adequate ML methods and the impact of the volume and quality of the data fed to the models.

The article is structured as follows: Section 2 describes the methodology and ML models used for SMP forecasting. Section 3 presents and analyzes the forecast results. Section 4 concludes with key findings and proposes future work.

2. Materials and Methods

2.1. General

To address the radial configuration of the NES and the challenges associated with SMP forecasting, two methodological approaches are considered. In Methodology 1 (M1), the models are trained using historical information comprising generation by technology type, net demand, and SMP values from all system busbars, incorporating a fixed set of five hourly lagged observations (from

t - 1

to

t - 5

). The choice of five time lags was determined through a preliminary empirical analysis in which configurations with fewer lagged inputs were evaluated using validation performance metrics. This revealed a saturation in predictive performance beyond five hours. Consequently, configurations with more than five lags were not explored, as they did not provide additional performance gains and led to substantially longer training times. The same lag structure was consistently applied across all zones and nodes to ensure comparability among models. In Methodology 2 (M2), the same temporal configuration is retained; however, the SMP inputs correspond to a reduced subset of busbars selected through a correlation-based feature selection process, as described in Section 2.3.

The difference in the amount of data fed into the models between M1 and M2 aims to analyze whether limiting the data in M2 to strategically selected information can enhance model interpretability, training efficiency, and forecasting performance, particularly when conditions such as seasonality or transmission constrains reduce the forecast performance for M1. A potential drawback may arise from the limited data, which can hinder the effectiveness of ML models due to underfitting and increased sensitivity to noise. In addition to marginal price data from the bars, system-level variables such as total system demand and injected power from various generation sources, including solar, wind, and thermal generation, were incorporated. These supplementary features aim to enhance model performance by capturing broader system dynamics that influence price formation.

2.2. Models

The machine learning models implemented for SMP forecasting in this study are the Decision Tree Regressor (DTR), Random Forest Regressor (RFR), Support Vector Regressor (SVR), Bayesian Ridge Regressor (BRR), and Automatic Relevance Determination Regressor (ARD). As a baseline, the model Linear Regressor (LR) is applied, to compare the forecast performance of the aforementioned ML methods to the results of this model.

Figure 5 presents the methodology for model construction applied in both M1 and M2. The dataset fed to the methodology of Figure 5 was obtained from the NEC’s website [34], where hourly values are available. The dataset was structured into a predictor set (input features) and a target variable, defined as the SMP at a specific bar and time step.

2.3. Data Splitting and Preprocessing

For model development, the available observations were divided into a training set (6854 samples) and an out-of-sample test set (1478 samples). The training set comprised all available data excluding the months of January and July, which were intentionally withheld and reserved exclusively for testing in order to assess seasonal generalization under contrasting operating conditions, namely, austral summer (January) and austral winter (July). Within the training set, a random split was applied, allocating 80% of the data for model fitting and the remaining 20% for validation. No feature scaling or normalization was applied to the input variables prior to model training, and all models were trained using the original data scales to ensure a consistent preprocessing pipeline across methodologies.

As shown in Figure 5, the data preparation stage includes two alternative data segmentation schemes. In Methodology 1 (M1), the model inputs include lagged SMP values from all system bars over the previous one to five hours, together with generation aggregated by technology and net demand. Methodology 2 (M2) preserves the same temporal structure and the same non-price predictors but replaces the full set of SMP inputs with a selected subset of bars. This design isolates the effect of using system-wide price information versus geographically targeted SMP signals, allowing a direct assessment of how spatial aggregation (M1) or selective localization (M2) propagates into forecasting performance.

The training strategy adopted in this study is node-specific, meaning that an independent machine learning model is developed for each target busbar. Rather than fitting a single global model across all nodes, each busbar is treated as an individual prediction target, with its corresponding SMP time series and associated explanatory variables used to train, validate, and test a dedicated model. Model performance was quantified using the evaluation metrics defined in Section 2.9.

2.4. Data Description and Predictor Variables

The dataset for this study was obtained from the official platform of the NEC [34] and includes operational records from 1 January to 13 December 2024. Data from December 14 onward was only partially available and was therefore excluded. The dataset consists of observations taken on an hourly basis, resulting in 8352 entries per variable for the year. This level of temporal granularity enables detailed modeling of short-term system behavior and price dynamics. Price predictions were not conducted for the entire system. Instead, the analysis focused on representative bars based on a correlation analysis. The selection of these bars was also based on their operational significance within the NES.

The following variables were included in the predictor dataset:

System Marginal Price (SMP) at multiple nodes (USD/MWh).
System Net Demand, measured in gigawatts (GW), representing the total system consumption minus the VRG.
Generation Dispatch by Technology Type (MW), categorized by energy source (e.g., hydroelectric, wind, solar, coal, natural gas), as reported by the NEC.

No outlier removal or additional filtering was applied to the raw data, thereby preserving its integrity for exploratory analysis and modeling.

2.5. Data Segmentation

The data segmentation is used to assess the spatial variability of electricity prices across the NES for M2. It allows the use of a reduced number of inputs from clustered busbars. The clustering was carried out using Pearson’s correlation coefficient (

ϱ

).

2.6. Training and Testing of ML Models

The datasets corresponding to the months of January and July were used as tests for the ML models. The selection of these months was made so that the models could be tested in summer and winter, during which different characteristics of VRG are found, particularly with lower input from solar energy. The evaluated models are described in the following section.

2.7. Machine Learning Models

Six regression models were considered: Linear Regression (LR), Bayesian Ridge (BR), Automatic Relevance Determination (ARD), Support Vector Regression (SVR), Decision Tree Regression (DTR), and Random Forest Regression (RFR). LR serves as a transparent baseline under a linear-additive assumption,

y = β_{0} + \sum_{i = 1}^{p} β_{i} x_{i} + ϵ,

(1)

where the target variable y is expressed as a linear combination of p explanatory variables

x_{i}

, plus a stochastic error term

ϵ

. The parameter

β_{0}

denotes the intercept, while the coefficients

β_{i}

(

i = 1, \dots, p

) weights each predictor on the response variable. LR provides a conservative reference for quantifying improvements achieved by non-linear and ensemble models.

BR and ARD extend the linear formulation through Bayesian regularization, improving robustness under multicollinearity and high-dimensional inputs. SVR applies structural risk minimization with kernel-based non-linear mappings, while DTR and RFR capture non-linear interactions through recursive partitioning and ensemble averaging, respectively. All models were implemented using scikit-learn under identical configurations across target nodes.

2.8. Hyperparameter Configuration

Default hyperparameters were retained for LR, DTR, BR, and ARD. For SVR, a polynomial kernel of degree three with regularization parameter

C = 10^{4}

was adopted to better represent non-linear price dynamics. For RFR, min_samples_split was set to 20 to limit tree depth and reduce variance, with parallel execution enabled (n_jobs = −1). Hyperparameter tuning relied exclusively on training and validation data, excluding the seasonal test months.

2.9. Performance Metrics

Forecast accuracy was evaluated using Pearson correlation (

ϱ

), mean absolute error (MAE), root mean squared error (RMSE), and the coefficient of determination (

R^{2}

). Pearson correlation (

ρ

) quantifies linear agreement between observed and predicted SMP,

ρ = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}},

(2)

while the MAE and RMSE measure absolute and dispersion-sensitive errors, respectively:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |, RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} .

(3)

The coefficient of determination,

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(4)

provides a normalized measure of explained variance relative to a mean baseline. Together, these metrics enable consistent comparisons across models, nodes, and seasonal regimes. In Equations (2)–(4), y denotes the value of the target variable, while

{\hat{y}}_{i}

represents the predicted value. The subscript i denotes the i-th sample while

\bar{y}

and

\bar{\hat{y}}

are the arithmetic mean of the target and predicted variables respectively. yellow

3. Results

In this section, the main results of the paper are presented, focusing on the performance differences between the two strategies used for SMP forecasting: Methodology 1 (M1), which uses training data from all NES busbars, and Methodology 2 (M2), which utilizes training data from correlated and therefore limited NES subsystem busbars.

3.1. Exploratory Analysis from the NES Data

A graphical analysis of the NES SMP for 2024 was performed using density distribution plots for 28 busbars, as shown in Figure 6. The figure illustrates the geographic dependence of SMP behavior, with the distributions arranged from northern bus bars at the top to southern bus bars at the bottom, highlighting systematic spatial variations in SMP values across the system. The frequency of the SMP throughout the NES shows a high presence of USD 0/MWh due to variable renewable energy, particularly photovoltaic energy in the northern and central parts of the country, and wind energy located in the northern and southern parts of the NES [12].

3.2. Data Segmentation of the SMP from the NES

Figure 7 shows the Pearson correlation matrix of the SMP values among the NES busbars. The busbars are ordered geographically from north to south along both the y-axis (top to bottom) and the x-axis (left to right). This correlation matrix serves as the basis for clustering busbars according to their correlation coefficients, enabling the reduction of dimensionality in the subsequent M2 by grouping highly correlated nodes into representative clusters. In the northern busbars of the NES, a strong correlation can be seen, where the Pearson correlation from busbar Parinacota to Lagunas is 1. Further, a strong correlation is found in the northern part of the NES, all the way to the N.P. Azucar busbar, with a Pearson correlation of 0.95. This area of the NES is characterized by strong participation in photovoltaic generation, with a high share of this capacity installed in the Antofagasta region [11], and also providing power to mining companies, the demand of which outweighs the residential demand for energy [15]. This zone covers a distance of over 1300 km.

In the central part of the NES, the busbars correlate with a lower value compared to the northern part. The Pearson correlation ranges from 0.85 to 1, covering the busbars from Quillota to Cautin. This area covers the demand of the largest population centers of Chile, including the capital Santiago. This zone covers a distance of over 1600 km.

The busbars in the southern part of the NES are characterized by high levels of transmission constraints, as shown in Figure 2. Consequently, the difference in behavior is reflected in low Pearson correlation values compared to the busbars in the central part of the NES. Therefore, this area is limited to three busbars that have a higher correlation among them, Valdivia, Puerto Montt, and Chiloe, covering a distance of about 370 km.

Figure 7 reflects how Chile’s geography and energy matrix shape SMP behavior, and three zones with three SMP clusters are identified: the northern, central, and southern zones. The SMP values from a representative busbar of each zone, shown in Table 2, are selected to be defined as objective variables for the SMP forecast based on their location and importance from a productive and demographic perspective.

Zone I: busbar Crucero;
Zone II: busbar Alto Jahuel;
Zone III: busbar Puerto Montt.

The busbar Crucero is located in the Antofagasta region, a copper mining hub that produces 13.9% of the global copper production [35]. This busbar is the main node that provides energy to this industry. The busbar Alto Jahuel is the main node that feeds the city of Santiago, the most populated city in Chile, with a population of over 7.1 million according to the 2017 census, which represents about 40% of the total population. Finally, the busbar Puerto Montt provides energy to the largest southern city of the NES, the city of Puerto Montt, which has a total population of over 240,000 people [36].

3.3. SMP Forecast Results

In this subsection, the predictive performance of various machine learning models for estimating hourly electricity prices at representative nodes of the NES is quantitatively compared using Methodology 1 (M1) and Methodology 2 (M2) and summarized in Table 3.

Model behavior at the Crucero busbar shows that RFR under M1 attains MAE

= 5.42

with

R^{2} = 0.94

, and M2 reports nearly identical results (MAE

= 5.40

,

R^{2} = 0.94

), indicating that the ranking is stable with respect to the methodological variant. At the other end of the spectrum, the Decision Tree Regressor (DTR) exhibits the weakest performance across targets, with particularly large errors at Puerto Montt, where the RMSE reaches 35.47 (M1) and 35.53 (M2), along with comparatively lower fit statistics. To facilitate visual comparison across models and targets, the mean absolute error (MAE) is selected as the primary metric for graphical representation, given its intuitive interpretability and consistent behavior across the methodologies.

Figure 8 shows the evaluation results based on the Pearson correlation coefficient between the predicted and actual System Marginal Price (SMP) values. The Bayesian Ridge (BR), Random Forest Regressor (RFR), and Support Vector Regressor (SVR) models consistently rank among the top three, exhibiting high correlation values (

0.85 \approx 0.90

) for the Crucero and Alto Jahuel busbars (Zones I and II) under both methodologies. These models share a strong capacity to adapt to non-linear systems, which allows them to deliver lower prediction errors, performing constantly better than the baseline model (LR).

However, for the Puerto Montt busbar, the predictive performance of these models declines during the winter period, as illustrated in Figure 8. This reduction coincides with a higher decoupling of SMP values from the rest of the NES. This trend is consistently observed under both M1 and M2. One possible explanation is that the models tend to overfit summer-season patterns in this zone, which compromises their accuracy under different seasonal conditions.

The models perform slightly better in Zone 1, which could be explained by the higher number of busbars and the higher correlation among them. This feature could be supported by very low or almost no SMP decoupling in certain zones among these bars throughout the year.

Poorer performance can be observed for DTR across all models, zones, and methodologies. On the other hand, for Puerto Montt, the baseline LR model outperforms SVR and has similar performance to BR, particularly in July.

Additionally, Figure 9 presents the mean absolute error (MAE) results for SMP forecasts under both methodologies. Lower MAE values indicate better predictive performance. Once again, the RFR and SVR models emerge as the top performers for the Crucero and Alto Jahuel busbars under both M1 and M2.

For the Puerto Montt busbar, model performance is again notably lower compared to the results in Zones I and II. However, when analyzing the MAE, the trend is inverted relative to what is observed with the Pearson correlation coefficient. In this case, Figure 9 highlights a significantly worse model performance during January, with deviations of approximately 20% to 25%. This discrepancy may be attributed to the presence of outliers in the prediction errors; however, this hypothesis would require further validation.

Figure 10 compares the predictive performance of M1 and M2 across the three zones analyzed. In 10 out of the 18 model cases, the MAE values were slightly lower under M2 than under M1. Specifically, in Zone 3 (Puerto Montt busbar), a region characterized by higher levels of transmission congestion, forecasts generated using M2 showed improved performance in four out of six models.

The most notable differences in MAE between M1 and M2 were observed in the Decision Tree Regressor (DTR) and Linear Regressor (LR) models. However, this difference diminishes when considering higher-performing models such as RFR and SVR, which maintain consistent accuracy across both methodologies.

These results suggest that, under M1, access to a larger dataset does not necessarily lead to better performance if the data lacks internal correlation. Conversely, under M2, using a smaller but filtered dataset does improve model accuracy when the available data volume is sufficient.

3.4. Statistical Comparison Between Methodologies

To assess whether the two alternative methodologies (M1 and M2) lead to statistically different predictive performance, a paired non-parametric Wilcoxon signed-rank test was applied with a significance level of

α = 0.05

. The comparison was performed on the distributions of hourly absolute errors obtained by each learning model at each electrical bus. This test does not assume normality and is therefore appropriate for the error distributions typically observed in electricity price forecasting. A two-sided test was first used to evaluate the equality of medians, and, when significant differences were detected, one-sided tests were employed to determine the direction of the effect.

Table 4 summarizes the p-values for two-sided and one-sided tests. The results reveal that the methodological choice has a significant impact on forecasting accuracy for several model–bus combinations. At the Puerto Montt bus, robust differences are observed for most models, where M2 yields significantly lower hourly absolute errors for DTR, LR, RFR, and SVR, whereas M1 performs better for ARD. At Crucero, M2 improves the performance of RFR (and marginally improves DTR), while M1 achieves lower errors for ARD. At Alto Jahuel, the behavior is more heterogeneous; M1 reduces the error for RFR and BR, whereas M2 is preferable for SVR. Overall, these results demonstrate that no single methodology is universally optimal, and the selection of M1 or M2 should be conducted in a model- and bus-dependent manner.

The comparison between the average MAE presented in Figure 10 and the results of the paired Wilcoxon test shows a high level of consistency between descriptive and inferential analysis. In particular, the visual reductions in error obtained with M2 for DTR, LR, RFR, and SVR at the Puerto Montt bus are statistically confirmed, as is the superior performance of M1 for ARD. A consistent behavior is also observed at Crucero, where M2 improves RFR and DTR while M1 yields lower errors for ARD, and at Alto Jahuel, where mixed methodological patterns are reflected in significant differences in favor of M2 for SVR and in favor of M1 for RFR and BR.

3.5. Temporal Analysis

Figure 11, Figure 12 and Figure 13 show the mean absolute error (MAE) of electricity price forecasts averaged over the 24 h of the day for three representative busbars: Crucero, Alto Jahuel, and Puerto Montt. The results correspond to three machine learning models, the Random Forest Regressor (RFR), Support Vector Regressor (SVR), and Bayesian Ridge (BR), evaluated under both Methodology 1 (M1) and Methodology 2 (M2).

This section provides a detailed hourly error analysis aimed at assessing the temporal behavior of forecasting errors and the models’ robustness in capturing diurnal electricity price dynamics.

The confidence intervals were generated using the seaborn Python library and computed through a bootstrapping procedure applied across days for each hour of the day. In this approach, the dataset was randomly resampled with replacement 1000 times, and the statistic of interest was recalculated for each resample. This process yields an empirical distribution that approximates the sampling distribution of the estimator under repeated sampling. The confidence intervals were then constructed by extracting the 95% percentile bounds of the resulting bootstrap distribution, as shown in Figure 11, Figure 12 and Figure 13.

As previously discussed, the Crucero busbar is located in a well-connected and low-congestion zone, which allows Methodology 1 (M1) to generalize effectively without introducing significant forecasting error. Methodology 2 (M2) follows a similar performance trend. The MAE remains consistently low across all hours, with average values below 10 $/MWh. Both the Bayesian Ridge (BR) and Support Vector Regressor (SVR) exhibit comparable performance under M1 and M2, which can be attributed to Crucero’s strong correlation with other busbars, as illustrated in the heatmap in Figure 7.

Alto Jahuel, being a centrally located busbar with moderate levels of transmission congestion, exhibits MAE values with an average below 10 $/MWh. Similar to the pattern observed in Zone I (Crucero), higher MAE values are recorded at 03:00, 08:00, 12:00, and 20:00 h. These time intervals coincide with pronounced ramping behavior in the System Marginal Price (SMP), primarily driven by variations in generation dispatch and load transitions.

Overall, model performance improves when the SMP follows a relatively stable profile, indicating that forecasting accuracy is closely linked to the volatility and variability of the price signal.

Puerto Montt exhibits significant temporal price volatility, largely attributed to frequent transmission congestion, as illustrated in Figure 2 and Table 1. This structural constraint likely contributes to the higher MAE values observed in comparison to Zones I and II, reflecting reduced forecast precision under both Methodology 1 (M1) and Methodology 2 (M2). The localized training employed in M2 yields performance levels similar to those obtained with M1.

The average MAE in this busbar reaches approximately 19 $/MWh, nearly double the average reported for Crucero and Alto Jahuel. Notably, elevated MAE values are concentrated during the 09:00–11:00 and 19:00–20:00 time intervals, expanding the number of hours characterized by high forecast error peaks.

When the holdout evaluation is disaggregated by month, January and July do not behave identically, suggesting a seasonal component in the underlying process. In several cases, July is associated with higher absolute errors (e.g., at Crucero and Charrúa), whereas

R^{2}

can remain comparable or even improve despite a larger MAE, which is consistent with a change in variability or operating regimes between months. Taken together, these results support that robust ensemble-style learners such as RFR provide the most dependable accuracy across bars and months for marginal cost forecasting, and that seasonal effects should be treated explicitly during model design, either through month-aware features or through evaluation protocols that preserve the temporal structure of the system.

Slightly higher error values are observed at 03:00, 08:00, 12:00, and 20:00 h. These time points correspond to significant transitions in the generation mix, particularly at 08:00 and 20:00 h, when solar energy ramps up and down, respectively. Additionally, demand ramp periods, such as around 13:00 and 20:00, contribute to increased variability and forecasting difficulties during those hours. These patterns can be explained by analyzing Table 5, where the most relevant variables for SMP forecasting are shown for the RFR model.

3.6. Variable Importance Analysis

A permutation-based importance analysis was performed for each estimated SMP node using Random Forest Regressor (RFR) models trained under two different methodologies (M1 and M2). The five input features that induced the greatest deterioration in predictive performance when randomly permuted are reported along with their corresponding normalized importance scores in Table 5.

The permutation-based importance analysis reveals a consistent dominance of autoregressive price information across all nodes and training methodologies. In both M1 and M2, the most influential predictors are systematically the lagged SMP values of the target node or electrically coupled neighboring nodes, highlighting the strong temporal persistence and spatial dependency inherent to marginal price formation in interconnected power systems.

Although exogenous physical variables such as solar generation, thermal generation, and net demand exhibit comparatively lower individual importance scores, their recurrent inclusion among the top-ranked features indicates a non-negligible contribution to the model’s explanatory capability, particularly in capturing regime-dependent and meteorologically driven variations. Notably, net demand does not consistently appear as a dominant predictor, being ranked among the five most relevant variables in only two out of six cases. This behavior highlights a fundamental distinction between data-driven machine learning approaches and traditional optimization-based methods for SMP estimation, which rely primarily on the explicit mapping between generation costs and a known net demand.

4. Discussion

This study evaluated electricity price forecasting accuracy in the Chilean National Electric System (NES) using two distinct methodological frameworks. While Methodology 1 (M1) utilizes a high-dimensional, system-wide approach, Methodology 2 (M2) adopts a localized strategy via correlation-based clustering. This spatial segmentation effectively captures the longitudinal topology of the NES, identifying three coherent zones represented by the Crucero, Alto Jahuel, and Puerto Montt busbars.

Our findings indicate that the Bayesian Ridge (BR), Support Vector Regressors (SVR), and Random Forest Regressors (RFR) consistently outperform other models across seasonal horizons. The robustness of these models stems from their ability to handle the non-linearities and volatility inherent in the NES, particularly during the solar ramp-down events observed in the late afternoon. As shown in the intra-day error analysis, these periods of rapid demand–supply transition represent the primary source of temporal non-stationarity, a challenge that ML-based approaches are uniquely equipped to manage compared to traditional time-series models [5,23].

The permutation-based importance analysis revealed, as expected, that lagged SMP values from both the target busbar and electrically adjacent busbars play a dominant role in the forecasting process. Additionally, the results indicate that the generation mix exhibits a consistently higher relative importance than net demand, which can be attributed to the prevailing overgeneration conditions observed in the NES during the evaluation period.

A critical contribution of this work is the quantification of the performance gap between M1 and M2. In well-connected regions like Zone I (Crucero), the methodologies yield nearly identical results. This suggests that, in the absence of transmission bottlenecks, the inclusion of system-wide information does not introduce significant “noise,” and the models are capable of filtering relevant signals even from a global dataset. However, the advantage of M2 becomes statistically evident in congestion-prone areas like Puerto Montt (Zone III). In these decoupled nodes, M2 improves accuracy for models such as DTR and LR by excluding weakly informative inputs from distant subsystems. This aligns with the findings of Díaz et al. [8], confirming that ignoring topological constraints under network stress leads to systematic forecasting bias.

The seasonal deterioration of performance in the southern subsystem (Puerto Montt) during winter further underscores the sensitivity of price formation to hydrological shifts and supply-side variability. These dynamics, as noted by Zhao et al. [27], reflect how market coupling and regulatory constraints interact with physical network limitations.

Temporal analysis further revealed that forecast errors fluctuate throughout the day, with peaks occurring during periods of ramping demand or transitions in renewable output, such as solar curtailment at dusk. These hourly patterns underscore that System Marginal Price (SMP) behavior exhibits both spatial and temporal non-stationarity. This level of variability is challenging to capture with conventional time-series methods but can be effectively addressed by advanced ML techniques [5,23].

The observed seasonal differences, especially the reduced performance of models during the winter months at the Puerto Montt node, further emphasize the sensitivity of ML forecasts to variations in generation patterns and network congestion. These observations are consistent with the findings of Zhao et al. [27], who employed system dynamics modeling to demonstrate the influence of market coupling and regulatory design on price volatility and supply-side behavior.

Finally, while interpretability tools like SHAP or feature importance could provide deeper insights into individual model behavior, they were intentionally excluded to maintain a clear focus on the methodological comparison of spatial input strategies. This choice avoids confounding model-specific interpretation with the broader impact of network-aware data selection. Future research will build upon these findings by integrating explainable AI (XAI) to further elucidate the causal drivers of price decoupling across different congestion regimes.

5. Conclusions

This study evaluated the accuracy of electricity price forecasting in the Chilean National Electric System (NES) using machine learning (ML) models under two distinct methodological approaches. We found that the SMP forecast yielded better performance with the Bayesian Ridge, Random Forest Regressor, and Support Vector Regressor consistently across zones and methodologies.

The results also confirm that the adopted methodology significantly affects SMP estimation errors and that this effect is strongly bus-dependent. Furthermore, they confirm the value of localized training, even when using smaller datasets. Methodology 2 (M2) achieved accuracy comparable to M1 in many instances, despite utilizing smaller datasets. This finding supports the arguments of Hooshmand and Sharma [28] and Panapakidis and Moschakis [26], who advocate for region-specific models in heterogeneous grids. At the Puerto Montt bus (the most decoupled of the three estimated bars), M2 consistently reduced the error for most models (DTR, LR, RFR, and SVR), whereas M1 is preferable only for ARD. At Crucero, M2 showed better performance for RFR, while M1 is more suitable for ARD. At Alto Jahuel, the behavior was mixed, with M1 outperforming in RFR and BR and M2 yielding lower errors in SVR.

Consequently, no single methodology can be considered universally optimal, and the choice between M1 and M2 should be made jointly considering both the learning model and the electrical bus.

Transmission constraints were identified as a critical factor influencing model performance. In highly congested zones such as Puerto Montt, both M1 and M2 models exhibited substantially higher forecast errors. Conversely, in well-connected areas like Crucero, ML models achieved significantly better performance, indicating greater effectiveness in regions with fewer operational constraints.

Forecast accuracy was further influenced by seasonal effects and diurnal System Marginal Price (SMP) behavior, due to intra-day variations in demand and generation mix, which impacted the hourly predictive accuracy of the ML models. These phenomena were further confirmed by the permutation importance analysis.

Future research should focus on improving forecasting performance through the application of alternative preprocessing techniques, such as feature scaling, the development of hybrid modeling approaches, and the integration of transmission system information as well as renewable resources such as wind speed as solar radiation into machine learning frameworks. Also, future work could focus on a forecast horizon that extends further than one-hour-ahead prediction, for multistep or day-ahead forecasting. Further improvement could come from multiyear analysis used to capture changes in policy, critical event disruption, and technology impacts, among other phenomena.

These advancements are expected to provide predictive advantages over models that rely exclusively on supply–demand fundamentals.

Author Contributions

Conceptualization, R.L., C.C. and S.V.; methodology, R.A.-G., R.L., S.V. and F.R.L.; software, R.A.-G., S.V. and R.J.V.S.M.; validation, R.L., S.V., R.J.V.S.M. and R.A.-G.; formal analysis, G.R., F.R.L. and R.A.-G.; investigation, R.L., C.C. and S.V.; data curation, S.V. and R.A.-G.; visualization, G.R., R.A.-G. and S.V.; writing—original draft preparation, F.R.L., R.L. and R.A.-G.; writing—review and editing, G.R., R.A.-G., F.R.L. and R.L.; supervision, R.L., S.V. and R.A.-G. All authors have read and agreed to the published version of the manuscript.

Funding

F.R.L. and R.A.-G. acknowledge the support from the Chilean National Research and Development Agency [grant number EQM220137] and the Universidad del Bío-Bío. R.L., S.V. and G.R. acknowledge the support from the Reasearch Direction from the Research Direction of Universidad Católica de la Santísima Concepción.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of Universidad del Bío-Bío.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Acknowledgments

During the preparation of this work, the authors used ChatGPT v5.1 and ChatGPT v5.2 in order to improve language and readability, as well as to improve the quality of images. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
BR	Bayesian Ridge
DTR	Decisional Tree Regressor
LR	Linear Regressor
MAE	Mean Absolute Error
ML	Machine Learning
M1	Methodology 1 (full dataset)
M2	Methodology 2 (correlation-based selection)
NES	National Electric System (Chile)
NEC	National Electric Coordinator (Chile)
PPA	Power Purchase Agreements
RFR	Random Forest Regressor
RMSE	Root Mean Squared Error
SMP	System Marginal Price
SVR	Support Vector Regressor
VRE	Variable Renewable Energy
VRG	Variable Renewable Generation

References

Lago, J.; De Ridder, F.; De Schutter, B. Forecasting Spot Electricity Prices: Deep Learning Approaches And Empirical Comparison Of Traditional Algorithms. Appl. Energy 2018, 221, 386–405. [Google Scholar] [CrossRef]
Alshater, M.M.; Kampouris, I.; Marashdeh, H.; Atayah, O.F.; Banna, H. Early Warning System to Predict Energy Prices: The Role of Artificial Intelligence and Machine Learning. Ann. Oper. Res. 2025, 345, 1297–1333. [Google Scholar] [CrossRef]
Tschora, L.; Pierre, E.; Plantevit, M.; Robardet, C. Electricity Price Forecasting On The Day-Ahead Market Using Machine Learning. Appl. Energy 2022, 313, 118752. [Google Scholar] [CrossRef]
Billé, A.G.; Gianfreda, A.; Del Grosso, F.; Ravazzolo, F. Forecasting electricity prices with expert, linear, and nonlinear models. Int. J. Forecast. 2023, 39, 570–586. [Google Scholar] [CrossRef]
Yang, W.; Sun, S.; Hao, Y.; Wang, S. A Novel Machine Learning-Based Electricity Price Forecasting Model Based On Optimal Model Selection Strategy. Energy 2022, 238, 121989. [Google Scholar] [CrossRef]
Yang, Z.; Ce, L.; Lian, L. Electricity Price Forecasting By A Hybrid Model, Combining Wavelet Transform, Arma And Kernel-Based Extreme Learning Machine Methods. Appl. Energy 2017, 190, 291–305. [Google Scholar] [CrossRef]
Ribeiro, M.; Stefenon, S.; De Lima, J.; Nied, A.; Mariani, V.; Coelho, L. Electricity Price Forecasting Based on Self-Adaptive Decomposition and Heterogeneous Ensemble Learning. Energies 2020, 13, 5190. [Google Scholar] [CrossRef]
Díaz, G.; Coto, J.; Gómez-Aleixandre, J. Prediction And Explanation of the Formation of the Spanish Day-Ahead Electricity Price Through Machine Learning Regression. Appl. Energy 2019, 239, 610–625. [Google Scholar] [CrossRef]
Alkawaz, A.; Abdellatif, A.; Kanesan, J.; Khairuddin, A.; Gheni, H. Day-Ahead Electricity Price Forecasting Based on Hybrid Regression Model. IEEE Access 2022, 10, 108021–108033. [Google Scholar] [CrossRef]
Laitsos, V.; Vontzos, G.; Bargiotas, D.; Daskalopulu, A.; Tsoukalas, L.H. Data-Driven Techniques for Short-Term Electricity Price Forecasting through Novel Deep Learning Approaches with Attention Mechanisms. Energies 2024, 17, 1625. [Google Scholar] [CrossRef]
ACERA Asociación Chilena de Energías Renovables y Almacenamiento AG. Estadísticas: Sector de Generación de Energía Eléctrica Renovable. Publicación de la Asociación Chilena de Energías Renovables y Almacenamiento. 2025. Available online: https://cdn.acera.cl/wp-content/uploads/2025/07/2025-06-Boletin-Estadisticas-ACERA.pdf (accessed on 1 December 2025).
Generadoras de Chile AG. Boletín del Mercado Eléctrico Sector Generación. 2025. Available online: https://generadoras.cl (accessed on 1 December 2025).
Systep Ingeniería y Diseños. Reporte Mensual Sector Eléctrico Chileno: Abril 2025; Monthly Market Report; Systep Ingeniería y Diseños S.A.: Santiago, Chile, 2025. [Google Scholar]
Coordinador Eléctrico Nacional. Reporte Energético Abril 2025. 2025. Available online: https://www.coordinador.cl/reportes/documentos/reporte-energetico/2025/ (accessed on 1 December 2025).
Coordinador Eléctrico Nacional. Proyección de Demanda de Largo Plazo del Sistema Eléctrico Nacional: Periodo 2024–2044. 2024. Available online: https://www.coordinador.cl/desarrollo/documentos/prospeccion-del-sen/proyeccion-de-demanda-de-largo-plazo/2024-proyeccion-de-demanda-de-largo-plazo/ (accessed on 1 December 2025).
Ministerio de Energía. Agenda de Energía 2022–2026: Estamos Presentes; Policy Agenda; Gobierno de Chile: Santiago, Chile, 2022. [Google Scholar]
Ministerio de Energía. Energía 2050: Política Energética de Chile; National Energy Policy; Gobierno de Chile: Santiago, Chile, 2014. [Google Scholar]
Amor, S.B.; Boubaker, H.; Belkacem, L. Forecasting electricity spot price with generalized long memory modeling: Wavelet and neural network. Int. J. Econ. Manag. Eng 2018, 11, 2307–2323. [Google Scholar]
Shah, I.; Iftikhar, H.; Ali, S. Modeling and Forecasting Electricity Demand and Prices: A Comparison of Alternative Approaches. J. Math. 2022, 2022, 3581037. [Google Scholar] [CrossRef]
Yang, Y.; Chen, Y.; Wang, Y.; Li, C.; Li, L. Modelling a combined method based on ANFIS and neural network improved by DE algorithm: A case study for short-term electricity demand forecasting. Appl. Soft Comput. 2016, 49, 663–675. [Google Scholar] [CrossRef]
Zheng, K.; Wang, Y.; Liu, K.; Chen, Q. Locational Marginal Price Forecasting: A Componential and Ensemble Approach. IEEE Trans. Smart Grid 2020, 11, 4555–4564. [Google Scholar] [CrossRef]
Algarvio, H.; Couto, A.; Lopes, F.; Estanqueiro, A.; Holttinen, H.; Santana, J. Agent-Based Simulation of Day-Ahead Energy Markets: Impact of Forecast Uncertainty and Market Closing Time on Energy Prices. In Proceedings of the 2016 27th International Workshop on Database and Expert Systems Applications (DEXA), Porto, Portugal, 5–8 September 2016; pp. 166–170. [Google Scholar] [CrossRef]
Kumar, P.; Anand, V.; Rajasekaran, G.; Sankaranarayanan, S.; Khairuddin, A.S.B.M. Intelligent Energy Price Forecasting using Deep Learning. In AIJR Proceedings; AIJR Publisher: Balrampur, Uttar Pradesh, India, 2022; pp. 35–41. [Google Scholar] [CrossRef]
Siddiqui, S.; Macadam, J.; Barrett, M. A novel method for forecasting electricity prices in a system with variable renewables and grid storage. Int. J. Sustain. Energy Plan. Manag. 2020, 27, 51–66. [Google Scholar] [CrossRef]
Ugurlu, U.; Oksuz, I.; Tas, O. Electricity Price Forecasting Using Recurrent Neural Networks. Energies 2018, 11, 1255. [Google Scholar] [CrossRef]
Panapakidis, I.P.; Moschakis, M.N. Comparison of Machine Learning Models for the Prediction of System Marginal Price of Greek Energy Market. Int. J. Inf. Control Comput. Sci. 2019, 13, 148–152. [Google Scholar] [CrossRef]
Zhao, W.; Lin, Y.; Pan, H. What Is the Effect of China’s Renewable Energy Market-Based Coupling Policy?—A System Dynamics Analysis Based on the Coupling of Electricity Market, Green Certificate Market and Carbon Market. Systems 2024, 12, 545. [Google Scholar] [CrossRef]
Hooshmand, A.; Sharma, R. Energy Predictive Models with Limited Data using Transfer Learning. In e-Energy ’19: Proceedings of the Tenth ACM International Conference on Future Energy Systems; Association for Computing Machinery: New York, NY, USA, 2019; pp. 12–16. [Google Scholar] [CrossRef]
Nowotarski, J.; Weron, R. Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renew. Sustain. Energy Rev. 2018, 81, 1548–1568. [Google Scholar] [CrossRef]
Pilot, K.; Ganczarek-Gamrot, A.; Kania, K. Dealing with Anomalies in Day-Ahead Market Prediction Using Machine Learning Hybrid Model. Energies 2024, 17, 4436. [Google Scholar] [CrossRef]
Xie, L.; Ilic, M.D. Model predictive dispatch in electric energy systems with intermittent resources. In Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, 12–15 October 2008; pp. 42–47. [Google Scholar] [CrossRef]
Saeed, F.; Rehman, A.; Shah, H.; Diyan, M.; Chen, J.; Kang, J.-M. SmartFormer: Graph-based transformer model for energy load forecasting. Sustain. Energy Technol. Assess. 2025, 73, 104133. [Google Scholar] [CrossRef]
Zhu, L.; Gao, J.; Zhu, C.; Deng, F. Short-term power load forecasting based on spatial-temporal dynamic graph and multi-scale Transformer. J. Comput. Des. Eng. 2025, 12, 92–111. [Google Scholar] [CrossRef]
Coordinador Eléctrico Nacional. 2025. Available online: https://www.coordinador.cl/ (accessed on 1 December 2025).
Comisión Chilena del Cobre. Chile aumentará a 27,3% su participación en la producción mundial de cobre en 2034. 2025. Available online: https://www.cochilco.cl/web/chile-aumentara-a-273-su-participacion-en-la-produccion-mundial-de-cobre-en-2034/ (accessed on 1 December 2025).
Instituto Nacional de Estadísticas (INE). Resultados Censo 2017. 2017. Available online: http://resultados.censo2017.cl/ (accessed on 1 December 2025).

Figure 1. Main nodes of the NES and their territorial location. (a) Geographic location of referencial electric busbars. (b) Interconection Chilean busbars (northern busbars). (c) Interconection Chilean busbars (central and southern busbars).

Figure 2. Transmission congestion during 2024 in the NES. Source: [12].

Figure 3. Marginal cost of energy at selected bars of the Chilean NES for 1 March 2024.

Figure 4. System load demand and generation dispatch by technology for 1 March 2024.

Figure 5. Schematic of the proposed methodology.

Figure 6. Density graph of the annual record of the SMP along the NES for bars from north (top) to south (bottom) for registered values during the year 2024.

Figure 7. Heatmap showing correlations among SMP values for different NES bars.

Figure 8. Pearson correlation for SMP forecast results for Methodology 1 (above) and Methodology 2 (below) in selected busbars from the NES.

Figure 9. MAE for SMP forecast results for Methodology 1 (above) and Methodology 2 (below) in selected busbars from the NES.

Figure 10. Average MAE of model predictions for M1 and M2 for the three representative busbars: Crucero, Alto Jahuel, and Puerto Montt.

Figure 11. Average MAE for a daily SMP profile for RFR, SVR, and BR forecast models at Crucero busbar for M1 and M2. Black bars represent the 95% confidence interval (C.I.).

Figure 12. Average MAE for a daily SMP profile for RFR, SVR, and BR forecast models at Alto Jahuel busbar for M1 and M2. Black bars represent the 95% confidence interval (C.I.).

Figure 13. Average MAE for a daily SMP profile for RFR, SVR, and BR forecast models at Puerto Montt busbar for M1 and M2. Black bars represent the 95% confidence interval (C.I.).

Table 1. Transmission constraints and SMP decoupling during 2024 in the NES. Source: [12].

Transmission Segment	January 2024		July 2024
Transmission Segment	% of Hours Decoupled	SMP Δ (USD/MWh)	% of Hours Decoupled	SMP Δ (USD/MWh)
Crucero–Cardones	4.0%	5.1	5.0%	5.8
Cardones–Pan de Azucar	0.1%	22.1	2.7%	4.5
Pan de Azucar–Quillota	1.7%	10.3	7.8%	7.6
Quillota–Alto Jahuel	7.9%	16.3	5.0%	46.4
Alto Jahuel–Charrúa	0.3%	4.2	28.5%	6.9
Charrúa–Puerto Montt	34.3%	59.4	43.3%	31.9

Note: SMP

Δ

denotes the average difference in System Marginal Price between decoupled areas.

Table 2. Areas of correlation for SMP forecasting in the NES.

Zone	NES Bars	$R_{xy}$ Range
I	Parinacota, Cóndores, Pozo Almonte, Tarapacá, Collahuasi, Lagunas, Crucero, Encuentro, Atacama, Laberinto, Mejillones, Domeyko, TEN, Chacaya, Los Changos, N. Cardones, N. Maitencillo, N. Pan de Azúcar	$0.95 < R_{x y} < 1$
II	Quillota, Polpaico, Alto Jahuel, Ancoa, Itahue, Charrúa, Cautín	$0.85 < R_{x y} < 1$
III	Valdivia, Puerto Montt, Chiloé	$0.95 < R_{x y} < 1$

Table 3. Regression metrics for M1 and M2 across target bars and candidate models (January and July holdout evaluation).

Model	Target Bar	Meth.	Combined				January				July
Model	Target Bar	Meth.	MAE	RMSE	$r$	$R^{2}$	MAE	RMSE	$r$	$R^{2}$	MAE	RMSE	$r$	$R^{2}$
ARD	Alto Jahuel	M1	10.78	17.05	0.88	0.78	11.77	19.00	0.88	0.78	9.79	14.84	0.88	0.77
ARD	Alto Jahuel	M2	10.82	18.03	0.87	0.75	11.61	20.23	0.87	0.75	10.02	15.53	0.86	0.74
ARD	Crucero	M1	10.82	17.54	0.90	0.81	11.83	19.86	0.90	0.81	9.81	14.87	0.91	0.82
ARD	Crucero	M2	11.14	17.97	0.90	0.81	12.07	20.34	0.90	0.81	10.22	15.24	0.90	0.82
ARD	Puerto Montt	M1	19.80	42.62	0.80	0.63	24.66	52.59	0.78	0.60	14.94	29.46	0.79	0.63
ARD	Puerto Montt	M2	20.21	42.12	0.80	0.64	25.35	51.65	0.78	0.61	15.07	29.69	0.79	0.62
BR	Alto Jahuel	M1	10.60	16.51	0.89	0.79	11.29	18.49	0.89	0.79	9.92	14.25	0.89	0.79
BR	Alto Jahuel	M2	10.67	16.55	0.89	0.79	11.37	18.57	0.89	0.79	9.98	14.23	0.89	0.79
BR	Crucero	M1	10.83	16.89	0.91	0.83	12.04	19.46	0.91	0.82	9.61	13.83	0.92	0.84
BR	Crucero	M2	10.78	16.91	0.91	0.83	12.08	19.58	0.90	0.82	9.47	13.74	0.92	0.84
BR	Puerto Montt	M1	17.43	33.47	0.87	0.75	20.39	37.24	0.88	0.77	14.48	29.23	0.80	0.63
BR	Puerto Montt	M2	17.44	33.46	0.87	0.75	20.60	37.46	0.88	0.77	14.29	28.93	0.80	0.64
DTR	Alto Jahuel	M1	12.30	23.46	0.80	0.64	12.85	26.51	0.81	0.65	11.75	19.96	0.79	0.63
DTR	Alto Jahuel	M2	13.09	25.29	0.78	0.61	13.96	27.77	0.79	0.62	12.22	22.55	0.77	0.59
DTR	Crucero	M1	11.46	22.93	0.85	0.72	13.84	27.68	0.83	0.69	9.09	16.88	0.89	0.78
DTR	Crucero	M2	10.46	21.01	0.87	0.76	11.83	24.82	0.86	0.74	9.03	16.26	0.89	0.80
DTR	Puerto Montt	M1	30.66	51.92	0.72	0.52	43.27	65.10	0.69	0.48	18.06	33.97	0.74	0.54
DTR	Puerto Montt	M2	29.93	53.75	0.71	0.51	40.48	66.24	0.68	0.46	19.41	37.31	0.70	0.49
LR	Alto Jahuel	M1	11.43	17.92	0.87	0.76	12.42	20.51	0.86	0.75	10.44	14.89	0.88	0.78
LR	Alto Jahuel	M2	10.75	16.56	0.89	0.79	11.57	18.58	0.89	0.79	9.93	14.27	0.89	0.79
LR	Crucero	M1	11.98	19.92	0.88	0.78	13.46	24.05	0.87	0.75	10.50	14.65	0.91	0.83
LR	Crucero	M2	11.33	17.19	0.91	0.83	12.08	19.24	0.91	0.83	10.59	14.87	0.91	0.83
LR	Puerto Montt	M1	19.73	42.95	0.79	0.63	24.05	52.71	0.77	0.60	15.41	30.18	0.79	0.62
LR	Puerto Montt	M2	18.87	42.67	0.80	0.63	23.44	53.06	0.77	0.60	14.29	28.74	0.80	0.64
RFR	Alto Jahuel	M1	8.90	15.41	0.90	0.82	10.34	18.08	0.89	0.80	7.46	12.16	0.92	0.84
RFR	Alto Jahuel	M2	8.98	15.28	0.90	0.82	10.44	17.84	0.90	0.80	7.51	12.19	0.92	0.84
RFR	Crucero	M1	7.71	14.56	0.93	0.87	9.06	17.59	0.92	0.85	6.35	10.70	0.95	0.90
RFR	Crucero	M2	7.79	14.80	0.93	0.86	9.21	17.98	0.92	0.84	6.36	10.72	0.95	0.90
RFR	Puerto Montt	M1	21.12	34.51	0.86	0.74	28.83	40.69	0.87	0.75	13.40	26.95	0.82	0.68
RFR	Puerto Montt	M2	20.15	33.61	0.87	0.75	26.65	39.19	0.87	0.76	13.66	26.91	0.82	0.68
SVR	Alto Jahuel	M1	9.28	15.20	0.91	0.83	9.82	17.17	0.91	0.83	8.73	12.93	0.91	0.83
SVR	Alto Jahuel	M2	9.18	15.02	0.91	0.83	9.77	16.89	0.91	0.83	8.59	12.88	0.91	0.83
SVR	Crucero	M1	8.97	15.26	0.93	0.86	9.70	17.72	0.93	0.86	8.23	12.33	0.94	0.88
SVR	Crucero	M2	8.99	15.45	0.93	0.86	9.99	18.15	0.92	0.85	8.00	12.17	0.94	0.88
SVR	Puerto Montt	M1	17.39	34.81	0.87	0.75	20.26	38.78	0.88	0.78	14.52	30.31	0.78	0.61
SVR	Puerto Montt	M2	17.11	34.46	0.87	0.76	19.58	38.08	0.89	0.78	14.64	30.43	0.78	0.61

Table 4. Wilcoxon signed-rank test results for absolute error distributions (M1 vs. M2). Bold values indicate statistical significance at the 95% confidence level (

p < 0.05

) and rejection of the null hypothesis.

Table 4. Wilcoxon signed-rank test results for absolute error distributions (M1 vs. M2). Bold values indicate statistical significance at the 95% confidence level (

p < 0.05

) and rejection of the null hypothesis.

Null Hypothesis	Busbar	DTR	LR	RFR	SVR	ARD	BR
$M e d (M 1) = M e d (M 2)$	Crucero	0.057	0.917	0.000	0.628	0.000	0.083
	Alto Jahuel	0.113	0.123	0.000	0.048	0.788	0.007
	Puerto Montt	0.000	0.000	0.000	0.011	0.000	0.837
$M e d (M 1) \leq M e d (M 2)$	Crucero	0.029	0.459	0.000	0.314	1.000	0.959
	Alto Jahuel	0.056	0.061	1.000	0.024	0.394	0.996
	Puerto Montt	0.000	0.000	0.000	0.006	1.000	0.418
$M e d (M 1) \geq M e d (M 2)$	Crucero	0.972	0.541	1.000	0.686	0.000	0.041
	Alto Jahuel	0.944	0.939	0.000	0.976	0.606	0.004
	Puerto Montt	1.000	1.000	1.000	0.995	0.000	0.582

Table 5. Permutation-based importance of the five most relevant features for RFR models at different nodes.

	M1		M2
Node	Variable	Importance	Variable	Importance
Crucero	SMP at Encuentro ${busbar}_{t - 1}$	0.2090	SMP at Encuentro ${busbar}_{t - 1}$	0.2434
	Solar ${generation}_{t - 1}$	0.1801	Solar ${generation}_{t - 1}$	0.1906
	SMP at ${Crucero}_{t - 1}$	0.1194	Thermal ${generation}_{t - 1}$	0.1370
	Solar ${generation}_{t - 5}$	0.0952	Solar ${generation}_{t - 5}$	0.1041
	Thermal ${generation}_{t - 1}$	0.0737	Net ${demand}_{t - 1}$	0.0993
Alto Jahuel	SMP at Alto Jahuel ${busbar}_{t - 1}$	0.7211	SMP at Alto Jahuel ${busbar}_{t - 1}$	0.6642
	SMP at Charrúa ${busbar}_{t - 1}$	0.2199	SMP at Charrúa ${busbar}_{t - 1}$	0.2668
	Thermal ${generation}_{t - 1}$	0.1972	Thermal ${generation}_{t - 1}$	0.2083
	Solar ${generation}_{t - 1}$	0.0956	Solar ${generation}_{t - 1}$	0.1001
	Solar ${generation}_{t - 5}$	0.0340	SMP at Itahue ${busbar}_{t - 1}$	0.0407
Puerto Montt	SMP at Chiloé ${busbar}_{t - 1}$	0.4758	SMP at Chiloé ${busbar}_{t - 1}$	0.4971
	SMP at Puerto Montt ${busbar}_{t - 1}$	0.3548	SMP at Puerto Montt ${busbar}_{t - 1}$	0.3763
	SMP at Valdivia ${busbar}_{t - 1}$	0.0446	SMP at Valdivia ${busbar}_{t - 1}$	0.0534
	SMP at Chiloé ${busbar}_{t - 2}$	0.0148	Solar ${generation}_{t - 1}$	0.0138
	Net ${demand}_{t - 1}$	0.0112	SMP at Chiloé ${busbar}_{t - 2}$	0.0126

Note: Variable subscript t − n indicates a time lag of n hours. Importance calculated via RandomForest permutation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

León, R.; Ramírez, G.; Cifuentes, C.; Vergara, S.; Aedo-García, R.; Lanyon, F.R.; Martin, R.J.V.S. Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning. Appl. Sci. 2026, 16, 1318. https://doi.org/10.3390/app16031318

AMA Style

León R, Ramírez G, Cifuentes C, Vergara S, Aedo-García R, Lanyon FR, Martin RJVS. Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning. Applied Sciences. 2026; 16(3):1318. https://doi.org/10.3390/app16031318

Chicago/Turabian Style

León, Ricardo, Guillermo Ramírez, Camilo Cifuentes, Samuel Vergara, Roberto Aedo-García, Francisco Ramis Lanyon, and Rodrigo J. Villalobos San Martin. 2026. "Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning" Applied Sciences 16, no. 3: 1318. https://doi.org/10.3390/app16031318

APA Style

León, R., Ramírez, G., Cifuentes, C., Vergara, S., Aedo-García, R., Lanyon, F. R., & Martin, R. J. V. S. (2026). Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning. Applied Sciences, 16(3), 1318. https://doi.org/10.3390/app16031318

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Forecasting of Electricity Prices in Chile Using Machine Learning

Abstract

1. Introduction

1.1. State of the Art

1.2. Novel Contributions and Structure of This Work

2. Materials and Methods

2.1. General

2.2. Models

2.3. Data Splitting and Preprocessing

2.4. Data Description and Predictor Variables

2.5. Data Segmentation

2.6. Training and Testing of ML Models

2.7. Machine Learning Models

2.8. Hyperparameter Configuration

2.9. Performance Metrics

3. Results

3.1. Exploratory Analysis from the NES Data

3.2. Data Segmentation of the SMP from the NES

3.3. SMP Forecast Results

3.4. Statistical Comparison Between Methodologies

3.5. Temporal Analysis

3.6. Variable Importance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI