1. Introduction
Smart Grids are currently being developed due to their practical application in energy management at the level of individual buildings, institutional buildings, industrial companies, and entire cities and even countries [
1]. Renewable Energy Sources in the form of photovoltaic systems, wind turbines, and hydroelectric power plants generate power characterized by seasonality and cyclicity and require intelligent management in order to balance the power grid [
2]. However, the balancing also includes the power of all electricity receivers that make up the Smart Grid [
3]. Effective energy management is possible thanks to innovative Smart Meter systems and thanks to wired and wireless techniques for transmitting measurement data. Big data obtained in this way can be a source of valuable information that can be used in Business Intelligence [
4]. Despite the general availability of many devices on the side of generating and receiving power that are devices of the Internet of Things, energy management in the Smart Grid still brings many problems [
5].
Smart Grids are especially relevant in the constantly developing area of generating energy from Renewable Energy Sources. The most popular and most common RES are photovoltaic systems [
6,
7]. They take the form of systems mounted on roofs of buildings [
8], ground systems, carports [
9], and mobile photovoltaic generators [
10]. Their capacities start from 1 kWp and reach several MWp in the form of large photovoltaic farms. The second most popular and available RES are wind turbine systems. These also have capacities from several kW to even several MW and can be scaled to even higher capacities as a result of installing from several to several hundred units, thus constituting wind farms. Wind turbines, especially those of high capacity, are installed on land [
11] and at sea [
12]. Among popular RES, it is impossible to mention hydroelectric power plants. An example widely known in the world is the hydroelectric power plant in Aswan (Egypt), and at the national level is the hydroelectric power plant on the Solina reservoir in Poland. In many countries around the world, energy from renewable sources constitutes a large part of the energy needed for power supply. In line with the global trend, the share of renewable sources is growing year by year.
Smart Grids integrate power generators with their receivers. The largest consumers of electricity currently include manufacturing companies from various industries, power supply for individual households, power supply for institutional buildings, and power supply for urban infrastructure. In line with the latest trend of Electromobility [
13,
14], increasingly greater capacities and amounts of electricity are needed to charge traction batteries of electric vehicles [
15,
16,
17,
18]. Humanity currently living in the digital era also needs increasingly greater amounts of energy to power data processing processes, including Artificial Intelligence farms [
19,
20]. Due to its size and purpose, Smart Grid integrates all power generators and receivers in order to effectively power the latter. Smart Grids can be connected to the national power grid or operate without such a connection. The latter are called island or off-grid solutions. In both types of Smart Grids, there is a challenge related to balancing the network.
Effective balancing of power grids, regardless of their size, requires energy storage systems that are able to accumulate large amounts of energy in times of high production from RES and release it in times of shortages. Currently, the most popular way of storing energy is stationary energy storage based on lithium-ion batteries [
21]. They have energy capacities from several kWh to several MWh. An important component that can perform energy storage functions for the needs of Smart Grids is electric vehicles with the Vehicle to Grid (V2G) function [
22]. An innovative energy network legally required in modern construction in many countries in Europe and around the world must take into account not only the charging of electric vehicles but also the return of energy from their traction batteries to the power grid. Therefore, traction batteries of electric vehicles can perform ESS functions. Technologies for short- and long-term storage of large amounts of energy in the third decade of the 21st century also include the use of hydrogen as an energy carrier. Hydrogen produced from surplus energy from RES can be stored, transported, and used to produce electricity and heat [
23,
24]. Very advanced hydrogen fuel cells are used for this purpose. There are also many other commercially available effective methods of energy storage that can be components in the Smart Grid [
25].
In recent years, numerous scientific articles have been published describing hybrid PV-hydro systems with flexible architectures, which allow for the adjustment of generating capacity to seasonal agricultural needs [
26,
27]. Taking these studies into account would allow for a better understanding of the potential of modular solutions for optimizing water and energy consumption on farms and increasing their resilience to changing climatic conditions [
28]. Furthermore, new research indicates the possibility of integrating these systems with energy storage and intelligent control, further improving their operational efficiency [
29]. Therefore, it can be concluded that identical research challenges exist in both modern construction and modern agriculture [
30]. Recent advances in integrated systems that combine photovoltaic panels with rainwater harvesting for agricultural applications further support the potential of decentralized, sustainable irrigation solutions [
31].
Selecting appropriate variables in existing Smart Grid systems is crucial for effective diagnostics, control, optimization, and ensuring operational reliability. These parameters should reflect both energy production (e.g., power generated by renewable energy sources) and consumption (e.g., self-consumption, grid consumption), and also account for surpluses that can be stored or sold. Carefully selected variables enable the development of analytical models and control algorithms that improve energy management efficiency and enable early detection of system anomalies.
2. Literature Review
At present, according to the authors, there are two major challenges in the Smart Grid related to its development and dissemination. The first one concerns the physical integration of all components present in the Smart Grid. The second one is the effective control of all components in the network in order to achieve the assumed control goals. Such goals may be ensuring the stability of the power supply without black-out or achieving the assumed level of use of energy from renewable sources. Effective communication with all components in the Smart Grid can be used to optimally control the entire network. However, this requires advanced control algorithms and calculation algorithms, and a scientific approach to the set management (control) goals. The highest level of control in the Smart Grid is the Advanced Process Control system, which operates in a fully automatic manner without human intervention [
32].
However, the human domain is the desire to improve various processes, which also includes Smart Grids. Smart Grids currently use advanced control algorithms that enable effective management of energy flow, integration of renewable energy sources (RES), and improvement of system reliability. One of the key solutions is fault detection, isolation, and restoration (FDIR) algorithms [
33], which automatically identify damage and reconfigure the network, minimizing the duration of power outages. In recent years, methods based on artificial intelligence (AI) and machine learning (ML) have developed significantly, which allow for load forecasting [
34], energy distribution optimization, and predictive maintenance of infrastructure. Voltage and reactive power control algorithms (VVMS—Volt/Var Management System) are used to maintain network stability, especially with a high share of distributed generation [
35]. Modern SCADA/ADMS (Advanced Distribution Management System) systems use real-time data analysis algorithms, which allow for better management of the distribution network [
36]. The use of blockchain technology in peer-to-peer (P2P) energy trading algorithms enables secure and transparent energy exchange between prosumers [
37]. Multi-level Feeder Reconfiguration (MRF) algorithms optimize network topology to reduce energy losses and improve efficiency. Distributed Energy Resources (DER) coordination algorithms are used in microgrids to ensure independent operation in island mode [
38]. Research on the use of 5G in the Smart Grid has led to the development of ultra-low latency network monitoring algorithms (URLLC—Ultra Reliable and Low Latency Communication), which significantly improve the precision of network state estimation [
39]. Model Predictive Control (MPC) algorithms are increasingly used to manage energy storage and load flexibility, which allows for better adaptation to variable RES generation [
40]. Recent developments also include hybrid algorithms combining deep learning with optimization methods that increase the network’s resilience to disruptions and cyberattacks.
Taking into account the above review of technology and science, the authors decided to use unsupervised clustering in the three-state space to determine the power signatures generated by a 50 kWp photovoltaic system supplying the administrative building of the university. In the presented case study, the authors will demonstrate and expert-validate the entire decision algorithm used for the current assessment of the Smart Grid operation. The approach presented by the authors can be widely used in training Artificial Intelligence in the area of Energy Management. Scientists continue to play a key role in the education of artificial intelligence (AI) in the Smart Grid area, providing training data and verifying and correcting the results generated by machine learning models. Energy experts define optimization criteria, such as minimizing transmission losses or power balancing, which are the basis for AI algorithms. Engineers supervise the training process of the systems, eliminating errors related to overfitting or bias, which increases the reliability of load and energy generation forecasts. In addition, humans introduce domain knowledge, such as network physics principles, into hybrid models combining deep learning with analytics, which improves their interpretability. Finally, humans evaluate the effectiveness of implemented AI solutions by conducting tests in simulations and real systems to ensure compliance with safety and energy efficiency requirements.
3. Materials and Methods
The Smart Grid used by the authors consists of a photovoltaic system with a peak power of 50 kWp, which was installed on the roof of a 4-story building in order to supply it with electricity. The building under study is the administrative part of the WSEI University in Lublin in Poland. Measurement data regarding the power generated by the photovoltaic system and the energy produced come from a photovoltaic inverter. This device not only converts the direct current produced by the photovoltaic system into three-phase alternating current supplying the building’s electrical network, but also measures and transmits a number of parameters characterizing the energy production process. Measurement data packets are sent from the inverter to the measurement cloud in data packets every 15 min. The inverter manufacturer’s web platform enables visualization and the possibility of saving the received data in CSV format. Many devices can be connected to the web platform, including Energy Storage Systems and devices called Smart Meters. The latter enables bidirectional measurement of the power flowing through a designated node in the building’s three-phase power network. In the analyzed case, the Smart Meter was installed just before the electricity meter in the power grid on the building side. It therefore allows for the measurement of energy drawn by the building from the power grid as well as energy returned to the power grid. The latter case occurs when the power generated by the photovoltaic system exceeds the demand for the building’s power. Such power is called surplus. Therefore, the authors have access to a complete set of information in order to assess the power supply of the university building from a dedicated photovoltaic system. The data used in the article comes from the entire month of March in 2025.
Measurement data from Smart Grid can be effectively processed using traditional and AI-supported algorithms. The purpose of their processing is usually to assess the system’s operation, the generated power, and the amount of energy produced, and diagnostic data regarding the correct operation of all system components. However, specialist tools are needed to read, process, and visualize measurement data. Very often, they currently take the form of computer analytical platforms or applications for mobile devices. Intelligently processed measurement data are a source of valuable information that can be used in Energy Management and/or making business decisions (Business Intelligence).
An analytical platform designed for scientific research in the domain of Smart Grids, particularly in relation to photovoltaic (PV) systems and building energy consumption, must meet a range of functional and technical requirements to support comprehensive data acquisition, processing, and interpretation. First and foremost, such a platform should be capable of integrating heterogeneous measurement sources, including photovoltaic inverters, smart meters, and environmental sensors, all of which may operate under varying data protocols such as Modbus, MQTT, OPC UA, or REST APIs. The capacity to ingest and harmonize data from these diverse sources is essential to ensure coherent time-series analysis, especially when examining the correlation between PV generation and power demand.
In terms of temporal resolution, the platform must support high-frequency data acquisition, ideally on the order of seconds or minutes, enabling researchers to capture transient phenomena and short-term fluctuations. Continuous data streams should be supported alongside batch ingestion, ensuring flexibility in both real-time monitoring and retrospective analysis. A robust Extract-Transform-Load (ETL) system is also imperative, allowing for the cleaning, filtering, interpolation, and aggregation of raw data to derive meaningful analytical variables [
41].
Equally important is the platform’s analytical and visualization capacity. It should provide tools for interactive exploration of temporal patterns, outlier detection, and comparative analyses, such as juxtaposing PV production curves against load profiles. Support for developing and deploying predictive models is also a critical feature, as Smart Grid research frequently involves forecasting energy generation, demand, or system behavior under different scenarios. Compatibility with machine learning libraries and statistical toolkits, such as Scikit-learn, TensorFlow, or R, facilitates the application of advanced analytical techniques.
The scalability and flexibility of the platform are vital for accommodating large datasets and diverse experimental configurations. Whether deployed on local infrastructure or in cloud environments, the system must ensure robust performance and adaptability. Moreover, the platform should support academic workflows, including the export of results in publication-ready formats, version control for datasets and models, and thorough documentation of analytical processes to ensure reproducibility and transparency.
Security and user management are also essential components, particularly when dealing with potentially sensitive energy usage data. Role-based access control, encrypted data transmission and storage, and detailed audit logs help safeguard data integrity and user accountability. Furthermore, the platform should align with relevant Smart Grid standards, such as IEC 61850 or IEEE 2030, to ensure interoperability and standard-compliant data handling [
42].
Examples of platforms that incorporate many of these capabilities include open-source frameworks like OpenEMS, time-series databases, and visualization tools like InfluxDB combined with Grafana, and scientific computing environments such as MATLAB/Simulink or Python-based ecosystems using Pandas and PVlib. Ultimately, the ideal platform should not only provide a technically sound foundation for data-intensive Smart Grid research but also foster methodological rigor and innovation in the modeling and optimization of distributed energy systems.
The authors had to select an appropriate analytical platform that has the appropriate tools and functionalities capable of implementing the proprietary algorithm for processing measurement data from the Smart Grid. The calculation algorithm includes the following steps:
- (1)
Reading measurement data from the photovoltaic inverter (generated power) and the Smart Metering system (consumed power) saved in CSV format.
- (2)
Displaying measurement data in the form of Line plot to assess their quality (incorrect type string instead of number, missing data in the time series).
- (3)
If there are irregularities in the recorded time series, they should be repaired using appropriate tools.
- (4)
Selection of time series for clustering: Autoconsumption, Surplus, Consumption from grid.
- (5)
Normalization of data to prepare them for clustering.
- (6)
Unsupervised clustering of measurement data using the k-means algorithm in the three-state space. Selection of the number of clusters from 3 to 10.
- (7)
Denormalization of data after clustering.
- (8)
Assessment of clustering quality using Silhouette Coefficient. Selection of the optimal number of clusters. Decision based on Silhouette Coefficient value and expert validation.
- (9)
Check and convert numeric data format after clustering if necessary.
- (10)
Assign colors to column values after clustering.
- (11)
Assign shape to different data categories after clustering.
- (12)
Visualization of clustering results in the form of Heatmap.
- (13)
Statistics of clustering results.
- (14)
Visualization of clustering results using Bar chart.
- (15)
Visualization of clustering results using Scatter plot: Surplus versus Autoconsumption, Consumption from grid versus Autoconsumption, Consumption from grid versus Surplus.
- (16)
Expert validation of unsupervised clustering results in three-state space.
The algorithm for processing measurement data and visualizing research results in the form of pseudocode is included in
Appendix A at the end of the article.
Due to past positive experiences, the authors chose the KNIME Analytical Platform version 5.3.2 build 11 September 2024. KNIME is an open-source analytics platform widely used for data integration, processing, and visualization in Smart Grid research [
43]. It enables the seamless import of measurement data from various sources, including IoT devices, smart meters, and photovoltaic systems. Through its intuitive, node-based interface, KNIME allows researchers to build complex data workflows without extensive programming knowledge. The platform supports data cleaning, transformation, and aggregation, which are essential for preparing high-resolution energy consumption and production datasets. It integrates with Python, R, and machine learning libraries, facilitating predictive modeling and anomaly detection in Smart Grid environments. KNIME’s advanced visualization tools help present time-series data, correlations, and forecast results in a clear and interactive manner. Overall, it provides a flexible and reproducible framework for analyzing energy data and supporting scientific studies in modern power systems. KNIME uses a graphical programming language, which makes it a tool very often used by scientists and engineers who do not have programming skills in languages such as C, C++, and Python. The implementation of the proprietary computational algorithm on the KNIME Analytical Platform is shown in
Figure 1.
In the context of the energy transformation and the growing importance of renewable energy sources, the analysis of measurement data from photovoltaic and load systems is becoming an important area of research within the Smart Grid concept. One promising approach in the analysis of such data is unsupervised clustering, which allows the identification of patterns and groups of behavior without the need for prior labeling of the data [
44]. The use of such methods can allow the recognition of typical day profiles, seasonal patterns of energy production and consumption, as well as the detection of situations that deviate from the norm, e.g., PV system failures or anomalies in the building’s energy consumption.
In the case of a university administration building, where energy generation by the photovoltaic installation occurs in parallel with dynamically changing demand for electricity, clustering can enable the classification of working days, weekends, low load periods, or consumption peaks. Unsupervised analysis can also help identify days when PV production effectively covers the building’s demand, which is important from the point of view of the self-consumption strategy and the optimization of the use of renewable energy. In turn, the detection of clusters with a significant discrepancy between production and consumption may suggest the need to use energy storage or adjust the building’s operating profile.
In addition, these methods can be used to support the forecasting of energy behavior based on historical data, which is used, among others, in demand management and energy infrastructure planning. Integration of clustering results with data on meteorological conditions, academic calendar, or user presence can significantly increase the precision of analyses and support operational and strategic decisions. In the long term, the use of unsupervised clustering can contribute to a better understanding of the energy behavior of public utility facilities and support the implementation of solutions consistent with the idea of smart and sustainable academic campuses.
This part of the article should provide additional explanations related to the authors’ choice of specific computational algorithms and measurement data processing methods. The first choice concerns min-max data normalization. In engineering studies, such as k-means analysis of energy data, min-max normalization is often preferred because it scales all variables to the same range (e.g., 0–1), eliminating the dominance of parameters with large numerical values and facilitating comparisons. Unlike z-scores, which assume a near-normal distribution and do not constrain values to a fixed range, min-max performs correctly regardless of the distribution shape and allows for easy interpretation of results in percentage terms. Robust scaling reduces the impact of outliers, but in energy data, these “extremes” often represent real and significant power peaks that are worth preserving. Furthermore, min-max allows for a simple inversion of normalization and presentation of results in physical units, which is crucial for engineers. This method was also chosen in this work due to its straightforward implementation in the KNIME platform and its consistency with the team’s previous analyses.
The second choice concerns the use of the k-Means algorithm in unsupervised clustering. The authors chose k-means primarily for practical reasons—the method is simple to implement, computationally fast, and supports the analysis of large measurement sets in the KNIME tool, which they use in their work. In the case presented in this article, clustering is a tool supporting rapid, expert inference about the operating states of a PV installation, rather than a complete exploration of all possible structures in the data, so they preferred an algorithm with a short runtime and easy interpretation of results. Density-based methods (e.g., DBSCAN) or hierarchical methods cope better with non-convex clusters and varying point densities, but require more complex parameter selection and are more difficult to integrate with the previously prepared normalization–denormalization and visualization process. Furthermore, the data in the study have relatively well-separated groups in three-dimensional space (self-consumption, surplus, grid consumption), which reduces the risk of problems typical of k-means with irregularly shaped clusters. Consequently, the authors prioritized speed and simplicity of interpretation over maximum flexibility in modeling cluster shape.
The third choice is related to the use of the Silhouette coefficient in assessing clustering quality. The authors used only the Silhouette coefficient because it is intuitive to interpret, allows for the simultaneous assessment of cluster compactness and separation, and provides a result in a constant range from –1 to 1, facilitating rapid comparison of different partitions without additional calculations. The priority in this study was practical, expert data analysis in a short time, and Silhouette performs well in three-dimensional feature space and is directly supported in KNIME. The Davies–Bouldin or Calinski–Harabasz indices could provide additional perspective, but they are less intuitive for those without a deep statistical background and require a more complex interpretation in the context of absolute values. Furthermore, Silhouette allows for easy combination of quantitative assessment with expert validation, which was a key element of the authors’ methodology. As a result, a single, consistent metric was chosen, enabling a quick and clear comparison of cluster count variants.
4. Results
Almost every manufacturer of photovoltaic inverters provides its users with access to an internet platform to monitor the operating parameters of the entire photovoltaic system. An inverter, being an Internet of Things device, sends measurement data packages describing the performance of the system in the form of generated power, produced energy, and diagnostic data to the data cloud. The internet platform available from the level of an internet browser or an application for a mobile device can download measurement data from the cloud for visualization and processing. The instantaneous power generated by the photovoltaic system can then be used to calculate the amount of energy produced. Increasingly, such internet platforms and applications can obtain measurement data from other Internet of Things devices. Examples include hybrid inverters capable of working with energy storage systems, energy storage devices themselves, and specific energy receivers in the form of chargers for electric vehicles. Additionally, such platforms can work with bidirectional energy meters or Smart Meter systems capable of measuring current flow in both directions in a non-contact manner. Additional information present in such applications is meteorological data related to the temperature and humidity of the surrounding air and wind speed, which significantly affect the performance of photovoltaic systems. Very often, the latest Internet platforms and applications prepared by inverter manufacturers are very advanced and allow not only for dynamic visualization of the processes of generating, storing, and consuming energy, but also automatically prepare periodic reports on the operation of the system. However, according to the authors, this is not enough to call them Smart Grid systems, despite the fact that they use many advanced algorithms that allow for the optimization of the operation of the photovoltaic system itself, management of the energy storage, and maximization of self-consumption. The development of platforms and applications is increasingly aligning with full Smart Grid functionality. However, scientists currently have to reach for more advanced tools for acquiring and processing measurement data in the Smart Grid. This chapter will present a case study of practical and effective acquisition and processing of measurement data.
4.1. Time Series Analysis of the Power Generated by the Photovoltaic System and the Power Consumed by the University Building
Information related to the value of the generated power and the amount of energy produced is important for the Smart Grid. However, much more valuable in energy management is the separation of the power generated by the photovoltaic system into the power directly consumed by the building and the power that is surplus to current needs. The latter can be fed into the power grid or stored in an ESS. The time series breakdown of the power generated by the photovoltaic system is shown in
Figure 2. The time series includes power measurements every 15 min for the entire month of March 2025.
From the initial review of measurement data in
Figure 2, it follows that very often the value of generated power exceeds self-consumption, and its surplus is fed into the power grid. However, appropriate tools are needed to quantify all power in specific or interesting periods of time.
A similar power separation was applied on the consumption side. The time series breakdown of the power consumed by the university administration building is shown in
Figure 3. The total power consumed by the university building consists of the power generated by the photovoltaic system and consumed for its own needs (auto-consumption) and the energy that was drawn from the power grid. The latter occurs when the power consumed by the building is greater than the power generated by RES. It is worth recalling here once again that the analyzed system does not include ESS. The time series of power in
Figure 3 shows that it is very often necessary to draw power from the power grid. And in this case, too, it is necessary to process measurement data in order to quantify individual powers in specific time periods.
The purpose of the analyses and processes of splitting individual powers presented in
Section 4.1 is to determine the important measurement data that will be subject to further processing in order to obtain information useful in energy management. This is a very important step in the preliminary analysis of data that allows us to determine the important time series from those less important or completely unimportant.
The authors selected three power time series that will be subject to unsupervised clustering in the remainder of the work:
- (1)
Autoconsumption
- (2)
Surplus
- (3)
Consumption from the grid
The research presented in the latter part of the article is a continuation of the work presented by the authors in [
45]. The previous one presents more of the relations between the power produced and the total power consumed. Previously, the authors performed clustering in the two-state space, and in the current work, they use clustering in the three-state space.
4.2. Unsupervised Power Clustering in Three-State Space
After selecting the time series to be categorized (or otherwise called labeling), the number of clusters must be determined.
Figure 4 presents the results of clustering in the three-state space of the Surplus versus Autoconsumption relationship, with a division into 3 clusters (
Figure 4a) and with a division into 10 clusters (
Figure 4b).
However, the process of selecting the number of clusters in unsupervised clustering should not be random. It can be determined based on the so-called Silhouette Coefficient, which is used to determine the quality of clustering. The results of the Silhouette Coefficient calculations for the number of clusters from 2 to 10 are presented in
Table 1.
The Silhouette coefficient measures how well objects are assigned to clusters by comparing the distances within and between clusters. It ranges from −1 to 1, where higher values indicate a better fit of an object to its own cluster and a weaker association with neighboring clusters. A high average coefficient value across all points suggests well-separated and compact clusters. Values close to zero indicate overlapping clusters or ambiguous assignments. Negative values indicate incorrect assignment of points to clusters. The data presented in
Table 1 show that the average clustering values for the division from 2 to 10 clusters gave very similar results, which indicates good clustering and well-separated and compact clusters. The highest average Silhouette Coefficient value was obtained for the division into 3 clusters (green in the last row in the
Table 1). However, the authors performed expert validation of the obtained results and decided that much more information would be obtained with the division into 6 clusters. The overall value for this clustering is only slightly lower (red in the last row in the
Table 1) than for the division into 3 clusters.
The results of clustering in the three-state space of the Surplus versus Autoconsumption relationship, with a division into 6 clusters, are presented in
Figure 5. The presented clustering results clearly show that the Surplus values for most clusters (except for cluster_1 in purple) range from 0 to 4000 W. That is, these are small surpluses. For cluster_1, the value of power delivered to the power grid ranges from approx. 4000 W to approx. 11,000 W. The size of cluster_1 is of significant importance in this analysis. However, even from this visualization of clustering results, important information related to energy management can be obtained. Power values in the range from 4000 to 11,000 W can be used to charge electric vehicles. Typically, the most popular wallboxes designed for charging electric vehicle batteries have such powers. Therefore, the conclusion from clustering is that it is better to charge an electric vehicle from the existing surplus than to deliver power to the power grid at very low prices. The size of cluster_1 is of significant importance due to the amount of energy produced and the appropriate selection of a vehicle with a specific energy capacity of the traction battery. The second solution is to transfer the surplus to a stationary energy storage facility—ESS.
The results of clustering in the three-state space of the Consumption from grid versus Autoconsumption relationship, with a division into 6 clusters, are presented in
Figure 6. These clustering results contain very important information, especially in terms of the appropriate selection of the peak power of the photovoltaic system to the power demand of the university building.
Figure 6 clearly shows that Consumption from the power grid occurs for the entire range of Autoconsumption. The largest Autoconsumption from the power grid is represented by cluster_0 (green). The powers drawn from the power grid in this cluster range from about 10,000 W to over 30,000 W. This indicates significant shortages of power generated by the photovoltaic system in relation to the energy needs of the building. Further research will allow for a preliminary quantification of these power shortages. However, other clusters such as cluster_2 (brown), cluster_ (red), and cluster_5 (orange) may have a significant share in the power shortage. Clusters such as cluster_4 (blue) and cluster_3 (purple) will not contribute to a significant increase in electricity bills from the power grid.
Comparing the graphs of the surplus (from
Figure 5) and the power drawn from the grid (from
Figure 6), one can immediately see that even if the entire surplus were collected in the energy storage, it would not be able to cover the consumption from the power grid. But it may be able to significantly reduce energy shortages.
The results of clustering in the three-state space of the Consumption from grid versus Surplus relationship, with a division into 6 clusters. They provide important information for energy management about the coexistence of Surplus and Consumption from the power grid. The previously identified as large cluster_1 (green) occurs almost exclusively for zero Surplus values. In turn, the identified as large and important cluster_3 (purple) occurs almost exclusively for zero Consumption from the power grid values. The analysis shows that these two clusters rarely, if ever, occur simultaneously. Clusters such as cluster_1 (red) and cluster_2 (orange) look very interesting in this comparison. Evidently, these clusters coexist on the graph. However, their range is small and ranges from 0 to approx. 4000 W on the Surplus axis and from 0 to approx. 14,000 W on the Consumption from the power grid axis. A reasonable question arises: how is it possible for Surplus and Consumption from the power grid to coexist at the same time? The only explanation is the existence of asymmetry in the load of the individual three phases of the electrical network in the building, with the simultaneous need for symmetrical supply of all three phases by the photovoltaic inverter. These results should prompt the energy manager in the university building to measure the uniformity of the load of the individual phases in the electrical network. The use of ESS or an electric vehicle charger seems to be a solution for the existence of the Surplus.
The relationships between the analyzed parameters, divided into clusters, presented in
Figure 5,
Figure 6 and
Figure 7, provided significant information on the co-occurrence of selected clusters in specific power areas. Significant quantitative data on the size of clusters can be provided by statistical analysis of clustering results, and its results are presented in the form of a bar chart in
Figure 8. The relative frequency of occurrence of individual clusters allows determining their significance in balancing power production and consumption in the Smart Grid system. The highest relative frequency is characterized by cluster_4 (blue). We will immediately perform a preliminary analysis of its impact on the system’s balancing capabilities. As is clearly shown, especially from the graphs presented in
Figure 5 and
Figure 6, the powers in this cluster have zero or close to zero values. However, its impact cannot be negligible in the assessment of the system’s functioning due to the relative frequency of occurrence (0.53) being more than 4 times higher than that of the other clusters. The records collected in this cluster constitute more than half of all records in the month under study. Even small amounts of power up to 5000 W can translate into large amounts of energy in both Surplus, Autoconsumption, and Consumed from grid. The second largest cluster is cluster_2 (brown). In three-dimensional space, it gathers records of small Surplus power, medium Consumption from grid power, and small Autoconsumption power. More complicated in terms of represented powers is cluster_1 (red), the third largest. It includes the states of the power generation and consumption process characterized by small and medium Surplus power, small and medium Consumption from grid power, which occur for medium Autoconsumption power values. For an expert, this cluster presents a big challenge in the area of collecting Surplus and potentially returning it to cover Consumption from grid. Cluster_0 (green) has a relative frequency of 0.9. However, it categorizes powers characterized by almost no Surplus, large and very large Consumption from grid powers, which occur for 0 to medium Autoconsumption values. This cluster is therefore responsible for a large part of the power drawn from the power grid. This cluster has no possibility of self-balancing due to the previously mentioned almost zero share of Surplus power. Cluster_5 (orange) has a much greater possibility of self-balancing, as does cluster_1 (red), which was analyzed earlier. However, in contrast to it, both Surplus and Consumed from grid powers have slightly higher values and occur for higher Autoconsumption powers. The lowest relative frequency of occurrence is characterized by cluster_3 (purple), and it amounts to 0.07. This is a cluster that includes the largest Surplus powers and zero or close to zero Consumption from grid powers. Surplus powers from this cluster can be stored and potentially released for the needs of cluster_0 (green). At this stage, it remains unclear whether surplus energy can be effectively stored and later used during peak demand periods.
Once we know the physical meaning of individual clusters in the process of generating and consuming power, and we know their frequency of occurrence, we can move on to the next step. This is the presentation in the form of heatmaps of the number of occurrences of individual clusters in specific time intervals, as shown in
Figure 9.
At this point, let us return to the problem indicated in the previous paragraph. Is it really possible to use the Surplus from cluster_3 (purple) to cover the Consumption from grid occurring in cluster_0 (green)? We can see it best on the heatmap cluster_3 (purple) occurs from sunrise to sunset. In Polish geographical conditions, sunset in March occurs around 16.00. ESS can then be charged with surplus power. Cluster_0 (green) also occurs during the day, but also in the evening. Cluster_0 (green) starts to occur a little later than cluster_3 (purple) and lasts over 3 h longer, after sunset. Especially after sunset, the energy stored in ESS can be used to power the university building, which is open daily until 20.00. Cluster_2 (brown) is also responsible for drawing power from the power grid in the evening. The Smart Grid could also use the energy stored in the ESS in this state instead of drawing energy from the power grid.
We had doubts earlier about the significance of the largest cluster_4 (blue). After displaying the number of cluster occurrences in individual periods of the day and night in the form of heatmaps, the doubts that arose were completely dispelled. We can certainly state that the largest cluster has a very small impact on balancing the power generation and consumption system. This is due to the fact that processes in this cluster take place especially at night. That is, after sunset until midnight and from midnight until sunrise. During the day, power production and consumption processes occur very rarely throughout the month.
By obtaining readings of the power occurring in individual clusters at 15-min intervals, they can be summed up for the entire month. The results of this summation are presented in
Table 2. They complement the results of the research related to the relative frequency of individual clusters (
Figure 8). The total power achieved throughout the month indicates that the energy production from the photovoltaic system is not sufficient to meet the building’s total electricity demand. An additional 2.52 MWh of energy had to be drawn from the power grid. In the same month, a surplus of 0.51 MWh of energy was recorded in the grid.
The discussion on the relationship between the relative frequency of the power-generating and consuming system (and thus energy production and consumption) in individual states (clusters) will be discussed in the next chapter.
5. Discussion
5.1. General Observations
The signatures of generated and consumed power determined as a result of supervised clustering in the three-state space provide important information about the operation of the Smart Grid. They allow for a quick assessment of the correctness of the selection of the peak power of the photovoltaic system to the energy needs of the university building. In addition, clustering allows for determining the occurrence of Surplus and Consumption from the grid and the precise localization of the corresponding states in specific time intervals of the day and night.
First, the significance of the largest cluster in terms of frequency of occurrence will be quantitatively verified. Cluster_4 (blue), with a frequency of 53%, includes operating states that actually correspond to small amounts of Autoconsumption energy (70.24 MWh), Surplus energy (55.63 MWh), and Consumption from grid energy (101.82 MWh). This quantitatively confirms the previous assumptions.
The largest amounts of Autoconsumed energy are corresponding to clusters: cluster_5 (1.44 MWh), cluster_3 (0.89 MWh), and cluster_1 (0.86 MWh), which are characterized by relative frequencies of occurrence of: cluster_5 (8%), cluster_3 (7%), and cluster_1 (10%). In the area of Surplus energy, the largest amounts correspond to clusters: cluster_0 (0.41 MWh) and cluster_3 (0.30 MWh), which are characterized by relative frequencies of occurrence of cluster_0 (9%) and cluster_3 (7%). The largest amounts of energy Consumed from the grid are responsible for the following clusters: cluster_0 (1.01 MWh) and cluster_2 (0.74 MWh), which have cluster_0 (9%) and cluster_2 (14%) relative frequencies of occurrence.
The presented examples lead to the conclusion that the relative frequencies of individual clusters can be expanded to the amount of energy produced/consumed in the three analyzed state spaces.
Based on the results of unsupervised clustering research, a quantitative assessment of the photovoltaic system’s suitability for a building’s energy needs can be formulated. In the analyzed month (March 2025), the 50 kWp PV installation generated approximately 4.265 MWh (Autoconsumption 3.752 MWh + Surplus 0.513 MWh), with the building’s total consumption 6.267 MWh. This means a PV share of demand coverage of ~68.0% (with theoretical self-sufficiency up to 68.0% with ideal surplus shifting) and actual self-sufficiency without storage of ~59.9% (autoconsumption/consumption), with an autoconsumption ratio of ~88.0% and consumption from grid of 2.515 MWh (40.1% of consumption). The average daily surplus was approximately 16.5 kWh/day, and cluster characteristics indicate recurring windows of daily surpluses (e.g., 4–11 kW in cluster_1) and evening deficits (10–30+ kW in cluster_0), confirming that the current level of PV power generation is insufficient to fully cover demand without storage, but consistent with partial coverage of 60–68% under March conditions. Consequently, investment decisions (e.g., ESS) may aim to capture approximately 0.51 MWh/month of surplus (≈15–20 kWh/day) with a charging capacity of 4–11 kW, which addresses the day-to-evening energy shift indicated by clustering.
5.2. Limitations
However, the applied research approach has certain limitations compared to other research methods. It generates some quantitative data related to the relative frequency of occurrence of individual clusters, but does not lead to a precise determination of the size of the energy storage. It also does not allow for an accurate performance of the power balance in the examined period of time. It also does not allow for obtaining data on Surplus and Consumption from the grid with accuracy to the probability distribution as can be obtained using the Metalog probability distribution family.
The obvious advantage of using unsupervised clustering to assess the processes of power generation by the photovoltaic system and its consumption by the university building is its simplicity and speed of calculations. Having ready-made tools together with a prepared calculation algorithm, it is possible to compare signatures on the Autonconsumption, Surplus, and Consumption from grid side in a few minutes. The applied approach assumes the analysis of one month, which is, according to the authors, the optimal period due to the seasonality of both generated and consumed power.
Another limitation of the approach used by the authors is that the analysis covers only a single month—March. This significantly limits the generalizability of the results to other periods of the year. Energy production from photovoltaic systems is characterized by strong seasonality resulting from changes in day length, solar angle, and meteorological conditions, which are specific to March in Poland and may not reflect the situation in the summer or winter months. As a result, the distribution of self-consumption, surpluses, and grid consumption in the analyzed period may differ significantly from that during periods of greater solar radiation or during periods of increased electricity demand. Furthermore, the days in March are relatively short, limiting the potential operation time of PV installations at full capacity, and may overestimate the relative share of energy drawn from the grid. The lack of data for the entire year prevents assessment of the extent to which the identified clusters and their frequency of occurrence persist in summer, when surpluses can be significantly greater, or in winter, when they may practically disappear. In the presented study, which focused on a single month, the authors intended to present the research method for such a representative time period rather than conduct a year-long analysis. Considering the full annual cycle will allow for a more comprehensive assessment of energy balancing capabilities and more accurate recommendations, for example, regarding the selection of energy storage capacity. Furthermore, seasonality can impact the effectiveness of energy management strategies, such as charging electric vehicles from surplus energy, whose availability varies depending on the season. While the analysis covering only March provides valuable operational conclusions for this period, extrapolating them to other months requires caution. In future studies, extending the observations to a full calendar year could significantly increase the reliability and usefulness of the results in the context of long-term planning. Accounting for seasonal variability is crucial for correctly assessing system performance under real-world operating conditions. The authors would also like to note that conditions related to power generation and consumption are dynamic and can change significantly. These may be caused, for example, by weather conditions varying in March compared to previous years, which will significantly impact the amount of energy produced by the photovoltaic system. The amount of energy consumed may depend on the nature of work performed by university administrative staff, the number of people on leave, the implementation of electricity saving strategies, or renovation work. This leads to the conclusion that energy management should focus on monthly analyses, based on which conclusions should be drawn for the following month. Looking for correlations between the same months in different years makes little sense, both in terms of generated and consumed energy.
5.3. The Ability to Scale the Calculation Method and Use It in Real-Time Systems
In terms of scalability, it should be noted that the approach used, while effective for analyzing monthly measurement data, may require optimization when working with larger datasets spanning multiple years or with high temporal resolution. The k-means algorithm is characterized by increasing computational demands with increasing point and dimensionality, which may limit its performance in stream processing environments. Real-time implementation would require the use of incremental versions of clustering algorithms or integration with big data platforms that enable parallel processing. The time required for preprocessing steps, such as normalization, imputation of missing values, and denormalization of results, must also be considered, as these must be performed continuously in online mode. Scalability is therefore possible, but requires adapting tools and procedures to maintain the speed and quality of analysis with significantly larger data volumes.
The authors’ approach focuses on analyzing data from a photovoltaic system through three-dimensional clustering (self-consumption, surplus, grid consumption) using k-means, expert support, and interpretable results. In the latest study [
46], an advanced optimization algorithm (Sparrow Search Algorithm) is used to tune a PI controller, which allows for improved stability, power quality, and dynamic response of a PV system integrated with a microgrid. Both solutions are aimed at improving the performance of photovoltaic systems, but the former operates at the analytical level (studying energy consumption and production patterns), while the latter operates at the real-time control level, focusing on improving the system’s parametric control performance.
The methodology presented in this study, based on the acquisition, processing, and unsupervised clustering of Smart Grid measurement data, can be effectively adapted to the agricultural sector. Modern farms increasingly rely on renewable energy sources, particularly photovoltaic systems, to power irrigation pumps, refrigeration units, feeding systems, and electric agricultural vehicles. By applying clustering techniques to time series data of self-consumption, surplus generation, and grid consumption, it is possible to identify operational patterns that optimize the use of on-site renewable energy.
5.4. Application of Research Results in Agriculture
In agricultural applications, surplus energy identified during peak PV production hours could be directed toward energy-intensive tasks such as water pumping for irrigation or charging battery-powered machinery. Conversely, clustering can reveal periods of high grid dependence, which may indicate the need for energy storage systems or adjustments to operational schedules. This approach is particularly valuable in off-grid or weak-grid rural areas, where balancing local energy production and consumption is crucial for maintaining operational continuity.
Furthermore, integrating the clustering results with meteorological and crop growth cycle data can improve the precision of energy demand forecasting. Such integration enables strategic planning of agricultural operations in line with renewable energy availability, contributing to cost reduction, increased energy self-sufficiency, and enhanced sustainability in agricultural production.