1. Introduction
Forecasting electricity demand is a critical task for transmission system operators, who must continuously ensure real-time energy balance and adjust supply and delivery in accordance with technical constraints and actual demand. Energy providers rely on both short-term forecasts (spanning minutes, hours, or days) and long-term forecasts (extending beyond one year) [
1].
Short-term peak demand forecasting is particularly important, as it enables the activation of demand-side response (DSR) programs under contractual agreements with strategic consumers to reduce or limit power consumption [
2,
3]. In this context, peak shaving constitutes a key strategy in smart grids, aimed at minimizing peak demand and reallocating energy use to off-peak periods. In addition, electricity suppliers may exercise prearranged options for demand reduction, whereby customers voluntarily curtail electricity consumption—typically during peak periods—in return for financial compensation.
Long-term peak demand forecasting serves two essential functions [
4]. First, it informs scheduling for maintenance and modernization of aging transmission infrastructure. Second, it supports strategic planning, including investment decisions for new generation capacity and the negotiation of long-term procurement contracts.
Electricity demand exhibits distinct cyclical patterns, typically recurring on daily, weekly, and annual scales. Demand variability among commercial consumers is largely driven by meteorological conditions, operational schedules, and sector-specific activities or equipment usage. Extreme weather events amplify these fluctuations; during heat waves, elevated electricity consumption from widespread air conditioning coincides with reduced power plant cooling efficiency due to lower river water levels, necessitating output curtailment. Conversely, during winter, prolonged low temperatures intensify lighting and heating needs, thereby increasing hourly load variability across the transmission system.
A wide range of methodologies has been explored for peak load forecasting, with varying degrees of accuracy. Broadly, these approaches fall into two groups: (1) statistical and artificial intelligence predictive models and (2) classification methods that recast the problem as one of pattern recognition. Statistical models, including autoregressive models, linear and nonlinear regression, as well as ARMA and ARIMA frameworks, are comparatively straightforward to interpret. In such models, deterministic components capture seasonal patterns, while stochastic components represent noise. These methods generally provide high predictive accuracy. In contrast, classification approaches treat peak detection as a recognition problem rather than an estimation task.
In this article, one-day-ahead peak demand prediction for commercial consumers is formulated, and the predictive model is constructed using feature sets derived from high-resolution historical electricity consumption records and meteorological variables. The increasing availability of large-scale, heterogeneous energy datasets characterized by high volumes and temporal granularity enables the application of data-driven machine learning methods for intelligent pattern recognition in electricity demand behavior. The proposed approach leverages these big data resources to identify peak demand events at the level of individual commercial customers, providing actionable insights for demand-side management (DSM) and peak shifting strategies. Such predictive capabilities support data-driven decision making in smart grid environments, allowing commercial consumers to mitigate peak loads, improve operational energy efficiency, and contribute to more flexible and resilient power system operation.
In general, peak electricity demand is concentrated within a relatively short duration, typically lasting three to four hours each day and most often between 11:00 a.m. and 3:00 p.m. DSM strategies for peak shifting focus on reducing energy consumption during these high-demand intervals or peak hours, in contrast to off-peak hours, as illustrated in [
Figure 1]. Due to pricing mechanisms applied by utilities—where electricity is sold at a premium during peak hours—commercial and industrial consumers can be incentivized, when informed in advance, to shift portions of their load to off-peak periods characterized by lower tariffs. Peak clipping is commonly achieved through direct load control, whereby selected processes are curtailed or temporarily suspended to alleviate demand during peak periods.
The novelty of this work lies in its four principal contributions to the literature on peak demand management within the context of big data analytics in smart energy systems.
First, while binary classification approaches have been explored in the literature, most peak demand studies focus on system-level or feeder-level forecasting or on estimating peak load magnitudes using regression or time series methods. In contrast, this paper reformulates peak demand identification as an individual, customer-level event detection problem, where peaks are defined relative to each customer’s own historical consumption distribution. This shift is non-trivial, as it enables DSM actions to be targeted selectively at customers who are likely to contribute to upcoming peak events, rather than issuing coarse, system-wide signals.
Second, customer-level modeling in prior work is often limited to small samples, specific buildings, or appliance-level datasets, frequently relying on highly contextual or survey-based information. In contrast, our study demonstrates the feasibility of peak event detection using large-scale, real-world smart meter data from commercial customers without requiring intrusive or customer-specific instrumentation. This large-scale empirical validation under real tariff conditions is still relatively scarce in the literature and represents a meaningful step toward operational applicability.
Third, the proposed framework explicitly decouples event detection from downstream DSM actions. Rather than embedding assumptions about peak reduction strategies, pricing schemes, or flexibility mechanisms into the model itself, this paper positions peak detection as an upstream analytical capability that can support multiple DSM objectives, including peak shaving, flexible demand activation, or incentive-based programs.
Finally, the manuscript demonstrates that high peak event detection performance can be achieved using relatively simple models, which has important practical implications. In large-scale DSM deployments, robustness and computational efficiency are often as important as marginal gains in accuracy. Showing that complex architectures are not strictly necessary to achieve operationally useful results provides a valuable counterpoint to the prevailing focus on increasingly sophisticated models.
The remainder of this paper is organized as follows.
Section 2 presents a comprehensive review of the relevant literature that addresses analogous problems.
Section 3 provides the characteristics of the dataset and its formulation within the framework of a binary classification task.
Section 4 provides the theoretical foundations of the classification models employed.
Section 5 delineates the performance evaluation metrics adopted for model testing.
Section 6 details the experimental design and reports the results of the experiments. Finally,
Section 7 concludes the paper and discusses potential directions for future investigation opportunities.
2. Literature Review
Peak load management represents a systematic approach to energy regulation intended to mitigate discrepancies between electricity supply and demand, thereby enhancing grid stability and reducing tariff volatility driven by demand fluctuations.
According to [
5], the evolution of methods for electricity peak demand forecasting can be divided into three distinct stages. The first, the manual or expert-driven stage, emerged in the late 1950s, though relatively few studies were conducted during this period. The second, the classical stage, beginning in the early 1970s, was dominated by regression techniques [
6], with additional contributions from time series analysis [
7], stochastic time series models [
8], and exponential smoothing [
9]. The third, the most advanced stage, starting in the early 1990s, has been characterized by a strong preference for predictive methods employing artificial neural networks (ANNs) [
10,
11], including deep learning [
12], genetic algorithms (GAs) [
13], support vector machines (SVMs) [
14], and ensemble methods (boosting, bagging, and random forests) [
15,
16].
The categorization of peak load forecasting methods into manual, classical, and advanced broadly reflects the evolution of electricity systems [
5]. During manual and classical stages, electricity markets were largely shaped by traditional energy-intensive industries, resulting in relatively stable peak demand patterns that could be effectively addressed using statistical approaches. The transition toward smart grids and a changing energy landscape has intensified demand-side drivers (e.g., demand-side management and electric vehicle penetration) and supply-side drivers (e.g., intermittent renewable resources at both the transmission and distribution levels), resulting in greater variability and reduced predictability of the peak demand. Consequently, advanced methods that leverage massive data volumes and are capable of capturing complex, nonlinear patterns—such as deep learning and machine learning techniques–have become increasingly favored.
Demand-side management constitutes another extensively adopted and effective strategy, particularly within intelligent energy systems, being aimed at optimizing consumption patterns, alleviating peak load stress, and minimizing overall electricity expenditures [
17,
18,
19]. DSM frameworks are conventionally categorized into energy efficiency interventions and demand response programs [
20].
Energy efficiency measures emphasize the reduction of energy utilization during system operation and production processes, yielding significant economic advantages for both utilities and end users. In contrast, the temporal variability of electricity demand requires the implementation of demand response (DR) schemes, which actively reconfigure consumption behaviors. DR initiatives not only provide economic incentives for end users to modify usage patterns but also support utilities in mitigating peak demand pressures. Generally, DR methodologies are grouped into two primary categories: incentive-based and price-based mechanisms.
Incentive-based demand response (IBDR) programs provide financial compensation to consumers who voluntarily reduce electricity consumption during peak demand or critical system events [
21,
22]. These programs are generally classified into classical schemes, including direct load control (DLC) and load curtailment, and market-based mechanisms, including emergency demand response and demand bidding. Classical IBDR compensates participants through bill credits or discounts, whereas market-based approaches reward consumers based on actual load reduction performance.
DLC programs primarily target residential and small commercial users by enabling utilities to remotely control registered appliances (e.g., heaters or air conditioners) to achieve peak shaving and load curve smoothing, though their scalability is contingent upon consumer acceptance. Load curtailment programs, which are more common in industrial and large commercial sectors, require contractual agreements in which participants commit to reducing load during system stress conditions, in which incentives are granted for compliance, while penalties are imposed for non-compliance [
23].
Price-based demand response (PBDR) leverages dynamic electricity pricing as a control mechanism to limit overall energy use and shift demand away from peak periods to reduce system costs. The principal pricing mechanisms applied in PBDR include time of use (ToU), real-time pricing (RTP), the inclining block rate (IBR), and critical peak pricing (CPP). While PBDR was initially adopted in industrial contexts to reduce production-related energy expenses, its integration into broader smart environments was limited by inadequate monitoring capabilities. The deployment of advanced metering infrastructures (AMIs) has since enabled real-time monitoring of consumption, facilitating more widespread adoption.
The ToU pricing mechanism partitions days into peak, mid-peak, and off-peak intervals, offering lower tariffs during off-peak hours to encourage load shifting. This approach is simple, widely accepted, and provides stable participation, making it one of the most favored PBDR strategies [
24,
25]. However, extensive shifting can unintentionally generate secondary peaks, which must be mitigated through careful system design. In contrast, RTP reflects wholesale market prices nearly in real time, transmitting hourly or daily price signals to consumers. While it offers high theoretical efficiency and allows retailers to balance distribution with reduced risk, its effectiveness relies on fully deployed smart metering infrastructure and on consumer responsiveness, which remains limited, especially in residential sectors [
26].
IBR programs employ tiered pricing structures, where consumption above a predefined threshold incurs higher tariffs [
27]. Thresholds can be specified on hourly, daily, or monthly bases, and this model has historically helped utilities reduce peak-to-average ratios (PARs) while minimizing the need for costly infrastructure investments. Finally, CPP focuses on ensuring system reliability by imposing substantially higher prices during critical peak events caused by extreme demand or elevated wholesale prices [
28]. CPP is not applied daily but can be integrated with ToU pricing, rewarding consumers who reduce or shift loads during stress events through discounted rates in non-critical periods. Although effective at peak reduction, CPP is less advantageous for long-term cost minimization.
Obviously, there are many additional examples of peak load demand forecasting methods and their applications in demand-side management programs. Given the rapidly growing availability of large-scale smart meter datasets and big data analytics in energy systems, the one-day-ahead peak demand prediction approach for commercial consumers proposed in this study may constitute an interesting alternative for modeling electricity demand. By leveraging data-driven machine learning techniques and high-resolution consumption data, the framework proposed in this paper illustrates how advanced analytics applied to large and heterogeneous energy datasets can enhance peak demand identification.
It should be noted that the analyses rely on unique, large-scale real-world datasets derived from commercial electricity consumers. Access to such fine-grained commercial energy data generated through advanced metering infrastructure remains highly restricted for research purposes due to privacy and market constraints. Therefore, this study contributes to addressing an important research gap related to the limited availability and utilization of commercial big data in electricity demand analysis, highlighting the value of real-world datasets for developing and validating scalable machine learning models in smart grid environments.
3. Dataset Characteristics
In this study, a historical dataset comprising electricity consumption records of commercial entities connected to the Polish power grid was analyzed. The dataset contained measurements of hourly electricity use for each company over a two-year period, spanning from January 2018 to December 2019. In total, the dataset included 19,714 unique commercial customers from central eastern Poland. Data were provided by Data Bridge, a company specialized in collecting data from electricity suppliers and providing analytics for the industry. This dataset provided a high-resolution temporal profile of electricity demand, allowing a detailed analysis of consumption patterns and facilitating the development of predictive models for peak load classification.
The initial dataset for 2018 consisted of 19,714 commercial customers. Data preprocessing involved the removal of customers’ exhibiting negative consumption values, constant load profiles, or incomplete measurement records. After filtering, the final 2018 dataset comprised 39,347,684 observations corresponding to 4924 customers. The 2019 dataset was restricted to customers present in the cleaned 2018 dataset to ensure consistency across years. Following the same preprocessing steps, the final 2019 dataset contained 26,336,025 observations from 3015 customers. Adding incomplete records could be a potential avenue for further research.
For commercial customers connected to the low-voltage network, utility providers offer Tariff C, which is subdivided into several variants based on the contracted power level. Tariff C is intended for small- and medium-sized enterprises, including retail, commercial, and service outlets, as well as farms that use electricity for production purposes, such as greenhouses, poultry farms, piggeries, or cold storage facilities. Customers under this tariff are classified into two main groups: (1) C1x, with contracted power below 40 kW, and (2) C2x, with contracted power above 40 kW.
The dataset used in this research was restricted to consumers classified under Tariff C22a, which is a two-zone tariff where electricity prices vary by time of day and month (see
Table 1). The peak and off-peak periods used in this study are defined according to the C22a tariff structure and reflect the time zones specified by the electricity provider, rather than an endogenous or optimized temporal segmentation.
At an aggregated level, the customer loads exhibited distinct daily and weekly seasonality [
Figure 2]. The daily load profiles varied in shape, depending on the type of day (workday, Saturday, or Sunday), while the weekly patterns were driven by the sequence of workdays and holidays. As illustrated in [
Figure 3], weekday consumption (e.g., Friday) displays a smooth curve, with minimal load during the evening, night, and early morning hours and a pronounced peak in the middle of the day. In contrast, the Saturday consumption levels were substantially lower and lacked clearly defined peaks. In general, the daily load characteristics, including the profile shape and demand level, were generally influenced by prevailing weather conditions such as temperature, wind speed, cloud cover, humidity, precipitation, and daylight duration.
Although the aggregated load profile for the customers was smooth, the individual load shape for the commercial customers may vary significantly, depending on their production or business activity. [
Figure 4 and
Figure 5] present example load profiles with irregular and regular fluctuations, respectively, for the selected customers. The customer in [
Figure 4] had electricity load profiles with no clear cycles or recurrent patterns, with substantial electricity usage ranging between 300 kWh and 1700 kWh. On the other side, there is a customer in [
Figure 5] who had relatively small electricity usage, ranging between 1 kWh and 11 kWh, with clearly defined and regular monthly and weekly cycles.
Finally, meteorological conditions constitute a critical exogenous factor in load forecasting. Among these, air temperature and humidity are most influential [
Figure 6]. Temperature affects not only the demand but also the properties of the transmission line, as elevated temperatures increase the conductor resistance, alter the reactance, and induce thermal expansion of the line length, thereby reducing the transmission capacity. The relationship between temperature and load is strongly seasonal. In summer, load and temperature are positively correlated, as higher temperatures increase the electricity demand for cooling (air conditioning and ventilation). In winter, an inverse relationship was observed, with lower temperatures driving higher electricity consumption for heating.
Humidity is another meteorological variable that influences electricity demand. Although it does not alter the actual air temperature, elevated humidity increases the perceived (apparent) temperature. For instance, an ambient temperature of 30 °C may be perceived as 35 °C or more under high humidity conditions. Although physical temperature remains unchanged, humidity amplifies the thermal discomfort experienced during hot weather.
In addition, given the characteristics of electricity consumption and the objectives of the analysis, the original dataset was filtered to reduce noise. All companies with negative readings, constant values, or incomplete readings were removed from the next steps and were not considered in the analysis.
4. Classification Models
The primary objective of this analysis is to mitigate extreme peak loads in the Polish power system by reducing the peaks from individuals. To achieve this goal, three models were examined:
Random forest;
Multilayer perceptron;
Logistic regression.
The multilayer perceptron (MLP), first introduced in 1958, remains one of the most commonly used neural network architectures for both classification and regression problems [
29,
30]. Its structure is relatively straightforward, consisting of an input layer, multiple hidden layers, and an output layer, where the number of input nodes matches the dimensionality of the input variables. Each layer contains at least one neuron, and each neuron is fully connected to all neurons in the subsequent layer. The configuration of the output layer depends on the nature of the problem, namely whether it involves classification or regression. The neural network possesses the following functional properties and capabilities [
31]:
Nonlinearity, enabling the modeling of complex relationships;
Input-output mapping by capturing dependencies;
Adaptivity, allowing learning from data;
Evidential response, providing confidence-related outputs;
Contextual information processing, incorporating surrounding patterns;
Fault tolerance, ensuring robustness to noise and missing data;
Uniformity of analysis, applying a consistent computational framework.
One limitation of the multilayer perceptron (MLP) is that it requires all input variables to be numeric. Consequently, categorical variables must be transformed into dummy (one-hot encoded) variables before being used in the model. A key challenge in training an MLP lies in selecting appropriate weights for the network. Initially, weights are assigned randomly. During training, the difference between the predicted and actual values is used to compute the error. It is subsequently propagated backward across the network layers—a process known as backpropagation—to update the weights in each layer accordingly [
32]. MLPs are capable of modeling complex relationships and can solve problems that are not linearly separable. Moreover, neural networks are universal function approximators, meaning they can approximate any continuous function given sufficient model complexity. An MLP was used in this study with two hidden layers composed of 37 and 17 neurons. The rectified linear unit (ReLU) activation function was applied to all hidden layers to enable efficient learning of nonlinear relationships. Model training was performed using the Adam optimizer with a learning rate of 0.001, providing stable and efficient convergence.
The random forest, in contrast, is a powerful ensemble learning technique that combines multiple decision trees to enhance predictive accuracy and reduce overfitting. It was introduced by Leo Breiman in 2001 as an extension of bagging (bootstrap aggregating), with the added benefit of random feature selection at each split, which enhances model diversity and robustness [
33,
34].
The algorithm operates by generating numerous decision trees in the training stage, where each tree is fitted to a bootstrap sample of the original dataset, typically comprising around 63% of the data due to sampling with replacement. At every node in the tree, a randomly sampled subset of features is evaluated to determine the most informative split. This randomness helps decorrelate the trees, making the ensemble more resilient to noise and overfitting [
35].
Random forests are capable of handling both classification and regression tasks. In classification tasks, the final outcome is determined by majority voting across all trees, while for regression, predictions are averaged. The method is non-parametric and can model complex, nonlinear relationships without requiring assumptions about the underlying data distribution [
33]. The key advantages of random forests include the following [
36,
37,
38]:
High accuracy and generalization capability;
Robustness to overfitting, especially with large datasets;
Automatic handling of missing values and outliers;
Feature importance estimation, which aids in model interpretability;
Scalability for large datasets and parallel computation.
Logistic regression, the last method considered in this analysis, is a member of the generalized linear model family and is especially appropriate for classification problems with discrete response variables [
39]. The core idea behind logistic regression is to estimate the probability of an input being assigned to a particular class. This is achieved by modeling the log odds of the outcome as a linear combination of the independent variables. The logistic (sigmoid) function is then applied to map these log odds to a probability value between 0 and 1 [
40]. Mathematically, the logistic regression model is defined as follows:
where
is the probability of the positive class,
are the predictor variables, and
are the model coefficients. The main advantages of logistic regression could be listed as follows [
40,
41]:
The coefficients in logistic regression can be interpreted in terms of odds ratios;
Logistic regression is computationally efficient and performs well on small to moderately sized datasets;
The model provides well-calibrated probability estimates;
It is applicable for multiclass problems.
5. Accuracy Measures
A rigorous evaluation of classification models requires the use of multiple performance metrics. Relying on a single metric may lead to misleading conclusions, particularly in cases of class imbalance or asymmetric error costs. Therefore, a comprehensive evaluation framework should incorporate several complementary measures. In this study, the following metrics were used to evaluate model performance: sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and balanced accuracy, which provides a more equitable assessment by averaging sensitivity and specificity [
42]. These metrics are particularly relevant in the context of electricity peak identification, which can be framed as a binary classification task. The outcomes of the classification models were evaluated through a confusion matrix, offering a detailed classification of true and false predictions. This matrix, presented in
Table 2, serves as a foundational tool for interpreting model behavior and guiding further optimization.
Based on the table presented above, the sensitivity measure is defined as the ratio of true positive records to all real positive records:
The specificity measure is defined as the ratio of true negative records to all real negative records:
For an imbalanced dataset, the useful metric is the balanced accuracy, defined as the average sensitivity across all classes [
43,
44]:
The balanced accuracy gives and equitable evaluation by assuming equal importance for the minority and majority classes. This metric is used in critical analysis such as medical diagnosis, where missing positive records may have serious consequences. The score values generated by the models (MLP or RF) are spread over 0–1, and the cut-off should be used to specify the final class assignment. In this paper, the following cut-off was used:
The area under the receiver operating characteristic curve (AUC-ROC) is a threshold-independent performance metric that quantifies a classifier’s ability to discriminate between peak and non-peak events across all possible decision thresholds. The ROC curve plots the true positive rate against the false positive rate, and the AUC summarizes this relationship into a single scalar value between 0 and 1. An AUC value of 0.5 indicates no discriminative power, while values closer to 1 reflect stronger separability between classes.
6. Data Simulations
6.1. Validity Threats
Overall, no threats to the validity of the findings were identified in this study. The analysis was based on numerical simulations implemented in R, a widely used open-source programming environment. To support the exploration of underlying relationships, the results were reported using both tabular summaries and graphical visualizations. However, a potential obstacle to fully comprehending the proposed methodology and results may be the reader’s level of expertise in computer technologies, statistical and machine learning techniques, as well as familiarity with the chosen programming tools.
The proposed framework nonetheless has several limitations. First, despite the application of strict chronological data splitting to mitigate information leakage, rolling-window feature construction and threshold-based labeling require careful execution. Second, peak event definitions relying on rolling percentiles are sensitive to parameter selection and may produce different results under alternative demand-side management (DSM) objectives. Third, the data preprocessing steps may introduce selection bias by favoring customers with stable and complete load profiles.
Finally, the findings are specific to the Polish C22a tariff, the studied region, and the 2018–2019 timeframe, and they should not be generalized without appropriate adjustments to other tariff structures, regulatory environments, or temporal settings.
6.2. Computational Resources
The numerical experiments were performed on a computer with the following specification: an AMD Ryzen Threadripper 2990WX 32-Core Processor, 64-bit CPU, and 1250 GB of RAM. To implement the considered models, R-CRAN software was used [
45]. All methods were trained without any under-sampling or over-sampling methods. The presented results should be interpreted independently of the absolute execution times, which may differ significantly across software implementations, hardware configurations, and deployment architectures. In operational DSM systems, alternative implementations and scalable infrastructures may be used to meet specific performance requirements.
6.3. Variable Analysis
To forecast peak demand, a feature vector was constructed using the attributes listed in [
Table 3]. These attributes were derived from hourly electricity demand time series. In addition, supplementary variables, including temperature, humidity, and calendar-based features, were incorporated.
Feature importance was assessed using the non-parametric Mann–Whitney U test. For each feature, the distributions of the feature values between the two outcome groups were compared independently. The Mann–Whitney U test was selected because it does not assume normality and is robust to differences in distribution shape and variance. Features yielding low p values indicate a statistically significant difference between groups and are therefore considered more informative for distinguishing the peak versus non-peak events.
Beyond hypothesis testing, the Mann–Whitney U statistic was leveraged to quantify the effect size by converting it into an area under the receiver operating characteristic curve (AUC-ROC). For each feature, the AUC was computed directly from the U statistic as AUC = U/( × ), where and denote the number of samples in the positive and negative classes, respectively.
Features were ranked according to their AUC-ROC values, providing an interpretable measure of relative feature importance.
Table 4 shows the top 10 features with the highest AUC values. The highest AUC of 0.8543 was obtained for the current-hour energy feature (
), confirming that the current load level is highly informative for distinguishing the peak hours in 24 h. Lagged load features at longer temporal horizons—particularly energy_144 (6 days), energy_120 (5 days), energy_168 (7 days), and energy_96 (4 days)—also ranked among the most important predictors. This pattern highlights the strong influence of weekly and multi-day seasonality in electricity demand, reflecting recurring behavioral and operational cycles.
Shorter lags aligned with the daily cycle, such as energy_24, energy_48, and energy_72, likewise demonstrated high AUC values, underscoring the relevance of diurnal and multi-day persistence effects in load dynamics. In contrast, extremely short lags such as energy_1 and energy_23, although still statistically significant, showed slightly lower AUC values relative to the longer lags, suggesting that the immediate past load contained comparatively less discriminative information at the prediction horizon considered.
Importantly, of the 74 analyzed features, only two did not reach statistical significance, namely the electricity load values observed at 8 and 9 h earlier. The load measured at these lags corresponded to a different phase of the diurnal cycle (e.g., nighttime versus daytime) and was therefore weakly informative for predicting electricity demand at the same hour on the following day.
6.4. Peak Demand Classification
Peak demand identification is formulated in this study as an event detection problem rather than a traditional load forecasting task. A peak event is defined as an hour in which a customer’s electricity consumption exceeds a rolling, customer-specific threshold corresponding to the 97th percentile of that customer’s load distribution in the preceding month. This adaptive thresholding strategy accounts for heterogeneity in consumption levels and operational patterns across commercial customers, and it enables the identification of relative extreme events at the individual level. Consequently, the proposed models aim to predict the occurrence of peak events within a predefined forecast horizon rather than estimate the absolute demand values. This formulation is particularly well aligned with demand-side management and demand response applications, where actions are typically triggered by deviations from customer-specific baselines rather than by system-wide load levels.
The electricity load could be significantly different between companies. To overcome this, the electricity load for each company was standardized based on metrics from the previous month. Companies with constant usage were excluded from the analysis; such companies do not increase the maximum power load. For each company, additional metrics such as previous electricity usage, ratios, and the day of week were calculated to improve the modeling step. Peak load determination was performed for each company separately, where a day-ahead load value was classified as a peak if it was equal to or greater than the 97th percentile of that company’s monthly load distribution, as presented in
Figure 7. The proposed assumption gives a time buffer for the company adjusting the electricity consumption within DSM interventions.
In addition, three benchmarking models were developed. The first benchmarking model assumed a randomly calculated value to be the peak value where 3% of the observations were classified as the peak (M_3_97). The second benchmarking model assumed the peak value to be the same as that observed for the same hour of the previous day (M_24), while the third benchmark assumed the value to be that observed for the same hour and the same day of the previous week (M_24_7).
The classification threshold used in this study was set equal to the observed class prevalence, and it should be interpreted as a reference operating point rather than an operationally optimal decision rule. In real-world DSM applications, the threshold would typically be optimized with respect to a program-specific cost or utility function that reflects the asymmetric costs of false positives (unnecessary DSM activation) and false negatives (missed peak events), which depend on tariff structures, incentive mechanisms, and regulatory conditions. Since these factors vary across electricity markets and countries, threshold optimization is inherently context-dependent. The proposed event detection framework is fully compatible with such optimization strategies, and the use of class prevalence here serves to illustrate model capability under a transparent and reproducible baseline. To prevent information leakage, all models were trained using data from 2018 and evaluated exclusively on data from 2019. A single global model was estimated per classifier using pooled customer data. Historical features were computed using only information available prior to each prediction time, including rolling customer-specific thresholds based on the preceding calendar month.
The findings of the numerical experiments are shown in [
Table 5] for the 2018 data and in [
Table 6] for the 2019 data.
The random forest (RF) and multilayer perceptron (MLP) models demonstrated notably high sensitivity values observed for both 2018 and 2019, indicating strong performance in correctly identifying peak events for more than 90% of the real cases. In contrast, the benchmark methods yielded substantially lower sensitivity scores, approaching the performance of random classification. Sensitivity, defined as the ratio of true positive cases to all actual positive cases, is one of the most widely used metrics for evaluating classification models.
Specificity, which measures the proportion of true negatives among all actual negative cases, is particularly relevant for assessing the accuracy of non-peak classification. All evaluated methods achieved high specificity values; however, these results should be interpreted with caution due to the class imbalance in the dataset, which may artificially inflate specificity. When considering balanced accuracy, which accounts for both sensitivity and specificity, the findings closely mirror those observed for sensitivity.
For the analyzed detection task, the multilayer perceptron (MLP) and random forest (RF) models achieved high AUC-ROC values of 0.94 and 0.943, respectively, on the 2019 dataset, demonstrating strong discriminative capability between peak and non-peak instances. These results indicate that both models exhibited robust performance across peak and non-peak classifications.
Based on the comparative analysis across years, the MLP model was selected for subsequent stages of the study, as it exhibited a smaller performance degradation between the 2018 and 2019 datasets. Additionally, the seasonal breakdown of model performance was strong, as shown in
Table 7. Finally, the MLP model showed good precision (to measure the accuracy of positive prediction) and F1 (to measure the accuracy of positive prediction) performance in 2019 under C22a-defined peak hours, as presented in
Table 8.
Compared with random forests, the multilayer perceptron (MLP) exhibited smoother decision boundaries and more stable probability estimates, which may be advantageous for calibration-aware decision thresholding. While this behavior was not explicitly quantified through formal calibration metrics, it may offer practical advantages in calibration-aware decision thresholding. This consideration is particularly relevant in demand-side management (DSM) applications, where probabilistic outputs are often used to support downstream decision rules rather than as absolute predictors. Notably, all evaluated classifiers demonstrated comparable peak event detection performance, reinforcing the conclusion that effective peak event identification can be achieved without the use of overly complex modeling approaches.
While the primary evaluation was conducted over a full annual horizon to reflect realistic operational deployment, additional seasonal performance metrics are reported below. In practical DSM implementations, such models are typically recalibrated or threshold-adjusted in accordance with system objectives (e.g., peak risk minimization during summer months), rather than relying on a single static configuration. As such, temporal adaptability is considered a deployment-level design choice rather than a modeling constraint.
6.5. Approach for Flattening the Peak
The objective of this study was to design and evaluate a system designed to mitigate peak electricity demand through predictive load management [
Figure 8]. The approach assumed that commercial consumers enrolled in a demand response program (step 1) so they could receive a predictive alert one day in advance, indicating a potential peak load event identified by the ML algorithm (step 2). In response, customers with predicted peaks were asked to implement load reduction strategies (step 3), and the effectiveness of these interventions was evaluated by comparing actual power consumption with simulated reductions at three threshold levels—5%, 10%, and 20%—(step 4) being shifted to off-peak hours (namely between 10:00 p.m. and 8:00 a.m.). Importantly, the DSM program targeted the demand peaks identified during peak hours defined under tariff C22a, as shown in
Table 1.
From a business operations perspective, aggressive load reduction, i.e., greater than 20%, may be impractical due to the nature of production processes, which limits the extent to which activities can be shifted from a peak zone to an off-peak zone. It is worth noting that secondary peaks may emerge during off-peak periods as a result of DSM programs, particularly when demand reductions from peak hours are shifted in a synchronized manner to identical off-peak time intervals. Despite this risk, secondary peaks during off-peak hours are generally less problematic than primary peaks because the total demand is typically lower, energy prices are cheaper, and there is more temporal room to redistribute load within the off-peak window.
Nonetheless, the analysis revealed that accurate peak detection, when coupled with targeted load reduction, can yield meaningful cost savings (step 5) and contribute to alleviating stress on the Polish power system, since the volume of energy shifted was significant, i.e., between 694,373 kWh and 2,777,493 kWh, depending on the scenario, as presented in [
Table 9]. A detailed evaluation was conducted for four representative days in November 2018 [
Figure 9,
Figure 10 and
Figure 11], demonstrating peak load reductions of 5%, 10%, and 20%, respectively. Moreover, the magnitude of load reduction was positively correlated with the level of financial savings achieved [
Figure 12,
Figure 13 and
Figure 14]. Similar figures illustrating the 20% reduction scenario for four representative days in February 2018 and four days in July 2018 are presented in [
Figure A1,
Figure A2,
Figure A3 and
Figure A4].
It is important to underline that the proposed solution is not intended to function as a standalone mechanism but rather as a supportive element within a broader, integrated demand-side management strategy for any national power system.
In the 2018 dataset used for this analysis, the difference between the peak and off-peak zones was equal to a PLN of 0.2476 per kWh. This tariff differential directly influences the potential financial savings achievable through peak load reduction strategies. Although greater reductions in power consumption generally lead to increased cost savings, the primary objective from the perspective of the Polish power system is to mitigate extreme peak loads, which pose significant challenges to the stability of a power system. It is important to recognize that electricity pricing in Poland has evolved considerably in recent years, particularly following the COVID-19 pandemic and the onset of the war in Ukraine. These events have led to increased market volatility and substantial changes in tariff structures. For instance, in 2025, the price differential between peak and off-peak tariffs for one Polish electricity provider reached an approximate PLN of 0.43 per kWh, nearly double the 2018 value. Incorporating current electricity prices into the proposed peak reduction framework could significantly enhance its economic effectiveness. The larger tariff spread in 2025 implies that even modest reductions in peak consumption can yield proportionally greater financial benefits, reinforcing the value of predictive and adaptive load management systems.
7. Conclusions
This study introduced a robust and scalable data-driven approach for electricity peak load classification using machine learning techniques specifically tailored to commercial consumers in the Polish power system. The proposed methodology operates within a big data analytics framework, leveraging large-scale smart meter datasets and heterogeneous external data sources. By reformulating the forecasting task as a binary classification problem, the research departs from traditional time series approaches and demonstrates the effectiveness of machine learning-based pattern recognition for extracting actionable knowledge from high-dimensional energy data and identifying peak demand events.
The use of large-scale real-world data from over 19,000 commercial entities, combined with meteorological variables obtained from distributed sensing infrastructures, enabled the development of predictive models with high sensitivity and balanced accuracy. The availability of high-resolution electricity consumption data generated by advanced metering infrastructure allows the application of scalable analytics and data-driven modeling techniques, illustrating the potential of big data technologies and machine learning methods to support intelligent demand forecasting and more effective demand-side management in modern smart grid environments.
Among the evaluated methods, the multilayer perceptron (MLP) and random forest (RF) classifiers outperformed the benchmark models, correctly identifying up to 90% of peak events with a 24-h horizon. These results underscore the potential of machine learning to improve demand-side management strategies by providing timely and accurate peak load alerts.
The proposed system facilitates proactive load reduction by commercial consumers, who can implement mitigation strategies upon receiving predictive alerts. Simulations revealed that even moderate reductions (5–20%) during predicted peak periods can lead to significant financial savings and contribute to grid stability. The economic impact of such interventions is amplified by current tariff structures, which exhibit a growing differential between peak and off-peak electricity prices. Importantly, the classification approach enables targeted DSM interventions at the individual customer level, ensuring that only those entities who contribute significantly to peak demand are engaged.
The peak-shifting scenarios analyzed in this study assumed that 5–20% of demand during detected peak events could be transferred to off-peak periods. These scenarios were intentionally simplified and should be interpreted as upper-bound estimates of potential peak reduction enabled by accurate event detection. In real-world DSM implementations, achievable load shifting is constrained by customer-specific flexibility, participation and compliance rates, technological limitations, and the risk of rebound or secondary peaks. Modeling such constraints requires detailed behavioral or appliance-level data that are not available in the present dataset and are beyond the scope of this study. Nevertheless, the proposed event detection framework is compatible with constrained or rebound-aware shifting mechanisms and can serve as an upstream input to more detailed DSM optimization models.
Although the analysis in this study emphasized peak reduction and load shifting away from high-demand periods, the proposed event detection framework is not inherently limited to this objective. In electricity systems with high penetration of renewable generation, periods traditionally associated with peak load may coincide with surplus supply or negative market prices, in which case increased demand can be beneficial to the system. The presented framework can accommodate such scenarios by redefining the target event or decision rule, enabling its use for both demand curtailment and demand stimulation strategies, depending on system conditions.
The cost-savings estimates reported in this study relied on simplified and stylized peak shifting assumptions and should therefore be interpreted as illustrative rather than predictive. They are intended to represent indicative upper-bound effects under idealized conditions and may not capture real-world constraints such as rebound effects, secondary peak formation, partial customer participation, or the operational and transaction costs of DSM deployment. Addressing these factors would require detailed modeling of load flexibility, customer behavior, and program design, which falls outside the scope of the present analysis. Accordingly, the objective of this work was not to derive fully realistic DSM outcomes but to highlight the potential economic relevance of accurate peak event detection within a simplified analytical setting.
From a policy and infrastructure perspective, the findings advocate for the integration of predictive analytics into national energy management systems. As electricity markets evolve and face increasing volatility due to geopolitical and environmental factors, adaptive and data-driven solutions like the one proposed in this study become essential. Building on the promising results of this research, future work may explore the following:
Integration of incompletely recorded commercial customer metrics into the modeling framework;
Extending the framework to residential and industrial sectors beyond tariff C22a;
Evaluating the integration of reinforcement learning for dynamic DSM optimization;
Assessing the long-term impact of predictive DSM on national power systems.
In conclusion, this study provides a scientifically sound and practically viable approach to peak load classification, offering a valuable contribution to the field of smart grid analytics and energy demand management.