Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules

Grigoras, Gheorghe; Gârbea, Răzvan; Neagu, Bogdan-Constantin

doi:10.3390/app14188228

Open AccessArticle

Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules

by

Gheorghe Grigoras

^*

,

Răzvan Gârbea

and

Bogdan-Constantin Neagu

Department of Power Engineering, “Gheorghe Asachi” Technical University of Iasi, 700050 Iasi, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8228; https://doi.org/10.3390/app14188228

Submission received: 14 August 2024 / Revised: 9 September 2024 / Accepted: 11 September 2024 / Published: 12 September 2024

(This article belongs to the Special Issue Intelligent Computing Systems and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

The increasing importance of hydropower generation has led to the development of new smart technologies and the need for reliable and efficient equipment in this field. As long as hydropower plants are more complex to build up than other power plants, the operation regimes and maintenance activities become essential for the hydropower companies to optimize their performance, such that including the data-driven approaches in the decision-making process represents a challenge. In this paper, a comprehensive and multi-task framework integrated into a Knowledge Discovery module based on Data Mining to support the decisions of the operators from the control rooms and facilitate the transition from the classical to smart Supervisory Control and Data Acquisition (SCADA) system in hydropower plants has been designed, developed, and tested. It integrates tasks related to detecting the outliers through advanced statistical procedures, identifying the operating regimes through the patterns associated with typical operating profiles, and developing strategies for loading the generation units that consider the number of operating hours and minimize the water amount used to satisfy the power required by the system. The proposed framework has been tested using the SCADA system’s database of a hydropower plant belonging to the Romanian HydroPower Company. The framework can offer the operators from the control room comparative information for a time horizon longer than one year. The tests demonstrated the utility of a Knowledge Discovery module to ensure the transition toward smart SCADA systems that will help the decision-makers improve the management of the hydropower plants.

Keywords:

smart SCADA; knowledge discovery; data mining; clustering; hydropower plants

1. Introduction

Hydropower is one of the most important sources of electricity globally, providing over 15% and with a total hydropower fleet of 1412 GW in 2023 [1], along with wind and solar. It helps to cut down on greenhouse gas emissions, which are a major issue of global warming. The electrical power sector is playing its part in reducing its impact on the environment by utilizing clean and renewable energy sources [2]. Hydropower plants currently account for over 75% of the world’s renewable energy sources and around 30% of the world’s flexible electricity supply capacity [3,4]. One of the most important factors that contributes to the security and flexibility of power systems is the ability of hydropower plants to generate electricity rapidly compared to other power plants such as coal, natural gas, and nuclear. They can also be stopped and started relatively quickly. Because of their high degree of flexibility, hydropower plants can quickly adapt to changes in energy demand. They can also compensate for the variations in supply from other energy production sources. This makes them an ideal choice to support integrating wind and solar power sources. Despite their widespread use, they have a huge potential to expand globally [3].

On the other hand, efficient planning and computational enhancements can increase the energy output by using the same available water [2]. The optimal operation of a hydroelectric power plant involves the gathering and processing of vast amounts of input data. Unfortunately, the techniques used in the design and implementation of the plant’s operations are not always able to extract the most value from the data. This paper proposes a clustering-based method that can help the Decision-Makers (DEMA) identify the optimal hourly load patterns of the generators. The daily generation scheduling method is an integral part of the decision-making process in a power plant. It helps reduce the time it takes to make critical decisions by implementing effective maintenance plans and energy production techniques [5].

The increasing importance of hydropower generation has led to the development of new technologies and the need for reliable and efficient equipment. It has become a vital factor in the operations and maintenance of these machines. Hydropower plants are typically more expensive to set up than other energy sources. Nonetheless, they have a longer lifespan than other power plants. The lifespan of a hydropower plant can be up to 50 years or more, which is longer than that of thermal plants. Usually, the economic and financial analyses suppose a lifespan of 30–40 years [6]. The operations and maintenance (O&M) costs should not be overlooked by the DEMA. They are typically around 2% of the investment. The specific O&M cost for large projects is around 2% to 2.5% and for small projects from 1% to 6% [7]. Although routine maintenance is carried out on the equipment, digital solutions can improve predictive maintenance allowing the hydropower plants to increase their efficiency [8], helping to maximize the life of a plant’s resources and assets. A study presented in [9] concluded that digitalization could improve the efficiency of a hydropower plant by 1% by better distributing the flow among the different turbine units, and the annual energy production can be increased by approximately 11%, depending on the site, if the spills and the hours to manage manual operations are reduced. It can also prevent costly repairs by identifying potential issues early. One of the most significant decisions to implement this approach is to monitor continuously the equipment’s condition through a smart SCADA system where the Knowledge Discovery modules are to be integrated. The core step used in the Knowledge Discovery modules is Data Mining. It involves extracting information and transforming it into a more understandable format (containing the operation patterns) by the DEMA from the operators in the control room of the hydropower plant. The information provided through such Knowledge Discovery modules can help improve the lifetime of the power units by reducing their downtime and enhancing their production. It can also minimize the costs of operations and maintenance [10].

Compared to other sustainable initiatives, such as eco-friendly products and renewable energy projects, the lack of visibility of the smart SCADA systems makes them less apparent. But they can play a vital role in helping the hydropower sector to improve resource utilization, optimize maintenance operations, and fulfill sustainability objectives [11]. The following strong points can be highlighted regarding the sustainability objectives of a smart SCADA [12,13,14]:

Remote monitoring and control: The SCADA systems can help reduce the need for travel to physical places of the electrical/mechanical aggregates from the hydropower plant due to the remote terminal units (RTUs), thereby decreasing carbon emissions associated with transportation. The systems allow the remote monitoring and control of aggregates/equipment from the control room using RTUs. Because the dispatcher can monitor and control the systems using RTUs, this flexibility reduces the need to move the service team in all important points, contributing thus to sustainability by lowering carbon emissions.
Energy efficiency: A smart SCADA system can help the use of a smaller water amount from the dam by monitoring and storing historical data in real time. This enables them to identify the improper working and leaks in the aggregates and installations, which can result in timely repairs and lower energy consumption.
Compliance with environmental regulations: A smart SCADA system can generate reports and track key environmental metrics, allowing hydropower companies to reduce their ecological footprint by monitoring the working hours of their generators and turbines. This enables them to plan their maintenance operations more precisely, which helps them save on resources and minimizes downtime.
Predictive maintenance: Predictive maintenance is ensured by a smart SCADA system, which keeps track of the hydro aggregate’s performance and running hours. This helps in identifying potential issues and planning preventive maintenance activities, which extends the lifespan of the equipment and reduces waste and the impact of manufacturing new machinery and parts. These systems can also provide notifications in real time in the event of malfunctions.
Rapid response to issues: Integrating with the Internet of Things can help a SCADA system provide rapid response capabilities. This allows it to monitor and respond to environmental incidents and operational issues in real time. It can also prevent more significant issues, such as failure or water leaks.

The challenge associated with a smart SCADA system should respond to the following two questions:

How can a deeper analysis of the data from various processes and equipment within the hydropower plant be performed?
Which is the best approach to perform the analysis?

This process must be carried out efficiently to maximize the information from the SCADA database. The literature presents different Knowledge Discovery applications in the hydropower industry. Parvez et al. [2] proposed a linear regression procedure used to determine the energy production relationship between upstream and downstream hydro plants. A cluster analysis has been performed to find the typical generation curves. The goal of this project is to develop a class-based extreme learning machine that can determine the optimal operation rule for a hydropower reservoir. Through a k-means clustering algorithm, the cluster analysis is performed to split the influence factors into several sub-regions. The extreme learning machine is then optimized by particle swarm analysis to identify the complex relationship between the cluster’s input and output. Feng et al. developed [15] a class-based extreme learning machine that can determine the optimal operation rule for a hydropower reservoir. Through a k-means clustering algorithm, the cluster analysis is performed to split the influence factors into several subregions. The extreme learning machine is then optimized by particle swarm analysis to identify the complex relationship between the cluster’s input and output. Zhang et al. proposed [16] an approach that can improve the quality of the monitoring data collected from hydropower units by implementing a clustering algorithm. This approach can be used to solve various problems related to the condition monitoring. A standard system to classify the huge amount of information that is collected and stored has been proposed by researchers in the study performed [17]. The system can meet the needs of the DEMA and provide them with the necessary services. Ahmed et al. [18] used three approaches, Local Outlier Factor as a density-based method, Feature Bagging for Outlier Detection as an ensemble method, and Subspace Outlier Degree, to analyze the anomalous data collected from a hydropower plant and compare their performance. The outliers were then verified by the expert utilizing a feature selection process and a decision tree to identify the critical variables that could be associated with the anomalies. Valencia et al. presented [19] a procedure that uses Knowledge Discovery to analyze a data set and extract structured information related to a hydroelectric power plant. This method can be utilized to train systems focused on identifying faults. Zhang et al. proposed [20] a decision tree-based clustering scheme that can be used to determine the various operating regimes in hydropower plants. The method uses k-means++ clustering to classify the data. The decision tree is then constructed using the group labels and other features. The decision tree is then analyzed and pruned according to the classification accuracy and complexity requirements. The reference [21] introduced the data mining concept integrated into a SCADA system to help the hydropower plant’s operators make informed decisions. The data collected by the data mining process can determine the typical loading profiles of each generation unit. Sahin and Karakus presented [22] a study on the energy generation forecasting of a hydroelectric plant based on Machine Learning and a hybrid Genetic Grey Wolf Optimizer-based Convolutional Neural Network/Recurrent Neural Network-Long Short-Term Memory regression approach. The findings can help improve the efficiency of resource management and energy generation.

Two aspects draw attention concerning the current applications of Knowledge Discovery (KD) in hydropower plants:

The vast majority of applications aim for a single task implemented at the level of data analysis regarding the energy production relationship between upstream and downstream hydro plants, energy production forecasting, identifying the operating regimes, improving the data quality, outliers’ detection, identifying faults, or determining the typical operating profiles.
The analysis time horizon corresponds with a day, season, or year.

Using as a starting point these two remarks, the main contribution of the paper is associated with designing, developing, and testing an original clustering-based data mining framework integrated into a Knowledge Discovery module from the SCADA software of a hydropower plant which fulfills more tasks regarding:

Performing an advanced statistical analysis and outliers’ detection,
Identifying the operating regimes and hourly typical operating profiles,
Developing the strategies for loading the generation units that consider the number of hours of operation and the minimization of the amount of water used to satisfy the power required by the system.

The framework can offer comparative information for a time horizon longer than one year, allowing the quick identification of key performance indicators that characterize the operation of the hydropower plant.

The remainder of the paper includes four sections. Section 2 presents the theoretical aspects regarding the integration of Knowledge Discovery and Data Mining in the smart SCADA, Section 3 integrates the details on the multi-task framework integrated into the Data Mining-based Knowledge Discovery, Section 4 covers the case study where the proposed framework has been tested using the SCADA system’s database of a hydropower plant belonging to the Romanian HydroPower Company, and Section 5 highlights the conclusions and future work.

2. Knowledge Discovery and Data Mining in Smart SCADA

2.1. Knowledge Discovery vs. Data Mining

Knowledge Discovery (KD) and Data Mining (DM) have transformed the power engineering research. To carry out effective and meaningful research, a deep understanding of various aspects of data mining and knowledge discovery is necessary [23]. It includes the expertise of specialists who work in all components associated with the energy generation, transmission, and distribution of the chain representing the power system.

There are some misunderstandings about the terms data mining and knowledge discovery defined in databases. Although many specialists and researchers use DM as a synonym for knowledge discovery, DM is not the entire knowledge discovery process. In addition to being defined as data mining, it comes with other names, such as information discovery or knowledge extraction. The KD is a process that aims to identify relationships and patterns in large datasets. It is defined typically as a non-trivial process that involves identifying novel, useful, and understandable patterns. In a narrow sense, KD refers to extracting information from a data source. While it can be performed through various methods, the term refers to obtaining knowledge from textual or database data. The combined process is referred to as the KD process.

Figure 1 shows the steps of a KD process, integrating the Data Mining, and the details are introduced in the following [21,23,24].

Data Selection. KD’s initial step is data selection, which involves gathering information from the SCADA database. This process is carried out to create a raw dataset.

Data Cleaning (Preprocessing). Ensuring that the data collected are of good quality is performed through preprocessing. This process involves handling noise, missing values, and inconsistencies.

Data Transformation. After cleaning the data, it is usually necessary to transform it into something suitable for mining. This can be carried out through various methods such as feature engineering and scaling. To make it easier for the machine learning tools, the label encoding was used to convert categorical data into a more readable format.

Data Mining. The DM step involves uncovering patterns, anomalies, or relationships between the data. The DM process is composed of numerous steps, each of which is related to a specific discovery task. The extraction of knowledge involves the process of gathering and storing information. It also involves analyzing and visualizing the data, designing models for machine and human interaction, and learning how to use efficient methods. One of the most frequently used techniques is clustering, which enables us to group the data into distinct groups based on similarity.

Interpretation. After Data Mining, the next step is to interpret the results. This involves understanding the clusters (patterns) and their features.

2.2. Data Mining Techniques

The Data Mining techniques are divided into two main categories: descriptive and predictive, see Figure 2 [25,26,27].

Cross-tabulation, correlation, and frequency are some of the characteristics used in the production of descriptive data mining. This process is utilized to identify the similarities between the data and the existing patterns. Another characteristic used in this type of analysis is associated with developing captivating subgroups. This is performed by analyzing the data and transforming it into meaningful information. Descriptive data mining involves techniques, such as clustering, association rule mining, and sequence discovery analysis.

Clustering. is commonly used in data mining to organize information by grouping related data points. It helps the DEMA identify patterns and similarities between different sets of information. It can be used to classify and extract patterns in the data, identify anomalies, or analyze spatial data.

Association Rules. The goal is to find the correlations in the data sets from the database. Even if the data sets come from different sources, these correlations can identify patterns that can help reveal process trends or explain the operation characteristics of the equipment/installations.

Sequence discovery analysis. The goal of sequence discovery analysis is to find interesting data sets that contain sequential patterns. This process usually involves identifying frequent patterns about a certain frequency support measure.

The second category aims to predict the future results of a given variable with a high degree of accuracy. This data mining is carried out using supervised learning techniques. There are three categories of methods that are used in this type of mining: regression, classification, and time-series analysis. The latter two are utilized in predictive analysis to model the data.

Regression. It is similar to the clustering technique in that it focuses on the relationship between a target and an independent variable. The target variables can be influenced by different predictors or independent factors, which is why a regression analysis is utilized. It predicts the outcomes based on the input fields relevant to the target.

Classification. It is carried out by identifying the various features of the data. This process helps to identify patterns and extract meaningful insights. It also helps in improving the quality of the data. The appropriate features are selected to classify it after the data has been collected and analyzed. A suitable algorithm is then chosen to implement.

Support Vector Machine. This algorithm creates a boundary between the different classes/patterns. It identifies the features that are most important to the classification process.
Decision Tree. The classification process is carried out using a tree-based structure. This algorithm uses a set of conditions to categorize the data. The root nodes of the structure are set for the test conditions, while the leaf nodes are for the outcome.
Neural Network. A neural network model is a computational resource that can recognize the relationships between various data sets. These units, which act like neurons, are formed by connecting the inputs and outputs. The model considers the connection strength and outputs the information in a hidden layer. The neural network is similar to the human brain in that it requires training to be effective. Although it can be hard to interpret, the models are reliable and can even classify past training procedures.

The Smart SCADA can improve the efficiency of the hydropower plant operation by combining Knowledge Discovery and statistical analysis. Figure 3 displays the interdependencies between the Knowledge Discovery and Smart SCADA (adapted after [28]).

Data are everything in the operation of a hydropower plant. The collection, analysis, and control actions make up the difference between classical and smart SCADA systems [29]. One of the biggest obstacles greeted by the transformation from the classic into a smart SCADA system is the lack of investment in hydropower processes. Many monitoring and control devices, representing remote terminal units and sensors from the hydropower processes, are old and should be replaced, but their upgrade is incredibly costly. The smart infrastructure integrating the data analytics modules alongside existing hardware can represent the solution for hydropower companies. If the process starts with the software part, it is much easier than replacing the hardware part. Thus, the transition from the classical to a smart SCADA system will be gradual. The goal of implementing a gradual transition is to make sure that the new system is stable before starting work on a smart SCADA infrastructure. The smart SCADA is built using knowledge discovery tools and delivers a unique and comprehensive solution for data processing from the hydropower aggregates/equipment. This holistic approach allows the fulfillment of the tasks at the hydropower plant level by the control room operator and the central level by the hydropower arrangement dispatcher.

3. Multi-Task Framework Integrated into the Knowledge Discovery Module

The main steps in the multi-task framework related to developing the Knowledge Discovery module based on Data Mining to facilitate the transition from the classical to smart SCADA system in the hydropower plants are discussed in this section.

Figure 4 presents the basic structure of an automation architecture presented in [30], including the proposed Knowledge Discovery module in the SCADA system implemented at the level of the power plant.

The data acquisition is performed on main hydro components (turbine, generator, power transformer, substation level), recorded in the SCADA system, and made available to the operator from the control room through the Human–Machine Interface. The Data Mining-based Knowledge Discovery module will be implemented at the SCADA level to ensure data-driven decision-making, ensuring the transition toward smart SCADA.

Figure 5 shows the flow chart of the multi-task framework integrated into the Knowledge Discovery module. Details regarding each task are provided above.

The SCADA system collects the water flows in the pipes that supply the turbines (WF_Pipe_1,…, WF_Pipe_n, and WF_tot_Pipes), in [m³], the active powers produced by the generation units (GUs), which are supplied through Pipe_1, …, Pipe_n (P_GU_Pipe_1, …, P_GU_Pipe_n), the requested active and reactive power of the system to the hydropower plant (P_req and Q_req), in [MW] and [MVAr]. It also contains information about the various technical parameters of the generation units, such as the active and reactive powers (P_GU1, …, P_GUn, and Q_GU1, …, Q_GUn), in [MW] and [MVAr], the stator voltage and current (Vs_GU1, …, Vs_GUn and Is_GU1, …, Is_GUn), in [kV] and [kA], the excitation voltage or current (Vex_GU1, …, Vex_GUn and Iex_GU1,…, Iex_GUn), in [V] and [A], water levels of the reservoir upstream and downstream (WLr_u and WLr_d), in [mdMB]. Appendix A includes in Table A1, as an example, the above information for a day. All variables are recorded in a table with the time details in the format month–day–hour for each year, as seen in Figure 6.

Task 1—Statistical analysis and outliers’ detection.

The statistical analysis is associated with exploring and presenting large amounts of data on the technical parameters based on the parameters such as mean, standard deviation, confidence degree, and quintiles-Q0 (minimum value), Q1 (25th), Q2 (50th), Q3 (75th), and Q4 (maximum value). Also, the boxplot is used to show the spread and skewness of the variables from the database through their quintiles. It can also include lines known as whiskers, which extend from the box to indicate variability outside the lower and upper limits of the dataset. Outliers with significant differences from the rest of the data can be plotted on the box-and-whisker diagram [31]. Thus, the DEMA (represented by the operator from the control room) can select the fields containing the values of the monitoring parameters. Regarding the outliers’ detection, a rules-based algorithm based on the values of the quintiles has been integrated [32,33]. These two rules refer to Q0 and Q4:

If < X^h < Q0^x > then < X^h is outlier >

(1)

If < X^h > Q4^x > then < X^h is outlier>

(2)

where X^h is the hourly value of the analyzed technical parameter and x refers to the name of the analyzed technical parameter (identically with the name of the field).

However, there are cases when the outliers identified for certain technical parameters can be associated with the different operating regimes compared to most regimes but which do not lead to the violation of the allowable limits. In these cases, an attention message will alert the DEMA, who will verify only those regimes. For all other cases, when the outliers are identified, the least squares method is applied to estimate the “true” value. The approach is based on a regression model determined for that parameter.

A comparison between more years can be performed by choosing the analysis period.

Task 2—Determining the operating regimes.

This task is based on the clustering-based data mining process. The input data are associated with a matrix structure built with the values of technical parameters selected by the DEMA. The regimes are identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GUn) or hourly loading patterns, including the hourly average power of each generation unit. The DEMA can choose any algorithm from the hierarchical clustering category (complete-linkage clustering, single-linkage clustering, average linkage clustering, centroid linkage clustering, median linkage clustering, or Ward linkage clustering) or K-means clustering algorithm. In hierarchical clustering, a similarity measure between the sets of observations is necessary to determine which patterns should be grouped or separated. Usually, in hierarchical clustering, the set’s similarity is determined by using a distance between the observations. It is carried out through a linkage criterion that specifies the set’s similarity.

When the DEMA requests characterization of the operating regimes through the typical operating profiles, then the matrix structure of the input data corresponding to the clustering process is identified with the fields of the active power produced by each generation unit in a certain time horizon (usually a year to cover all operating regimes).

The matrix size is H_year × (N_HPP^GU × 24 + 2), where H_year represents the number of hours when at least one generation unit worked and N_HPP^GU represents the number of the generation units from the hydropower plant. The additional columns correspond to the water levels of the reservoir—upstream and downstream.

Each obtained pattern is associated with a typical operating profile that characterizes the operating regime of the hydropower plant in certain periods (days inside a year). The DEMA can identify the operating regimes of the hydropower plant through an evolution along the time axis of the degree of hourly loading of all generating units and the water volume used in each operating regime. The DEMA can establish an operating strategy for the plant for the next day depending on the forecasting of the requested powers by the system.

The input data matrix is different when the DEMA’s requests are as the operating regimes to be characterized through hourly loading patterns. The structure is identified with the fields of the active power produced by each generation unit in a certain time horizon (usually a year to cover all operating regimes). The size of the matrix is H_year × (N_HPP^GU + 2), where H_year and N_HPP^GU have the same signification as above. The last two columns correspond to the water levels of the reservoir.

Task 3—Developing the strategies to load the generation units.

The task is associated with an expert system that uses the operating regimes to be characterized by hourly loading patterns determined above and the number of hours of operation. This last information is recorded in a database and considered in the decision-making process to avoid overloading the generation units over a long period, which leads to minimizing the number of maintenance operations. Using a water amount to satisfy the power required by the system represents the main objective. The main components of the expert system are presented synthetically in the following [34,35].

The knowledge base is composed of two main elements: the rules base (which contains the knowledge required to solve problems) and the facts base (the patterns obtained in the clustering-based data mining are recorded in this base).
The inference engine can determine the mode in which knowledge derived from the rules base is utilized to interpret the data from the information base. It can perform various tasks, such as confirming or rejecting a hypothesis or the solution of a problem.
The editor of the knowledge base provides the DEMA with the ability to update and inspect the information base’s content, particularly its rules base’s content.
The explanation system can provide explanations for the stages in the Expert System’s reasoning.

4. Case Study

The proposed framework has been tested using the SCADA system’s database of a hydropower plant belonging to the Romanian HydroPower Company.

The plant, identified through the red circle in Figure 7, is the first from a hydro arrangement located on an important river in eastern Romania. The plant has six Francis-type units. The first water pipe supplies the last two units, and the second piper the first four units. The first four units have a total installed power of 27.5 MW, while units five and six have 50 MW.

The SCADA database includes fields completed at every hour with the following technical parameters: the individual water flows of each pipe, the total water flow of the two pipes, the total produced power of the first four-generation units (GU1–GU4), the produced power by the GU5, the produced power by the GU6, the total produced active and reactive power by the plant, the frequency, the stator voltage, the stator current, the produced active and reactive powers, the excitation voltage, the excitation current of each generation unit, and water levels of the reservoir (upstream and downstream).

The SCADA file associated with a day from the database containing the fields highlighted above is shown in Figure 8. The signification of the blank cells corresponds with the case of the non-loading of the generation unit. The obtained results at the year level for each task integrated into the proposed framework are presented in the following.

The DEMA can see after data processing the summary information on the operation of the plant regarding the number of hours and the total energy produced by each generation unit (GU1–GU2), see Figure 9.

Table 1 presents the extracted results from the advanced statistical analysis containing the mean (m), standard deviation (σ), and quintiles (Q0, Q1, Q2, Q3, and Q4) for each technical parameter from the database. The values have been calculated only for the hours when at least one unit was in operation. Also, the DEMA has available boxplots from which outliers are identified quickly.

Two situations are presented in the following. The first refers to the loading of the generation units when the outliers identified associated with the different operating regimes compared to most regimes did not lead to the violation of the allowable limits, see Figure 10. In these cases, an attention message is launched to the DEMA, who must verify the regimes and establish if any disturbance appeared.

The second situation belongs to the downstream level of the water reservoir recorded, where more outliers have been identified exceeding the upper limit. For these values, the least squares method has been applied to estimate the “true” value.

In the case of the outliers below Q0, these depend on the upstream level of the water reservoir, which did not lead to the violation of the allowable limits. Figure 11 and Figure 12 present the results obtained in these cases (with and without outliers over the maximum limit).

The second task refers to determining the operating regimes identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6) and hourly loading patterns, including the hourly average power of each generation unit. This task is based on the clustering-based data mining process.

The input data have been associated with a matrix structure built with the values of hourly active powers of all six generation units selected from the database from three successive years (2017–2019). The K-means clustering algorithm has been used to obtain the typical operating profiles presented in Figure 13, Figure 14, Figure 15 and Figure 16.

Appendix A includes in Table A2, Table A3, Table A4 and Table A5 the hourly values of each operating regime (identified through the Patterns P1–P4) containing the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6). The input data used to obtain hourly loading patterns contain the following fields: water flows on the two pipes, the loading of each generator unit, and the upstream and downstream water level of the reservoir.

The K-means clustering algorithm has been used to extract the patterns, and the optimal number has been 25. Figure 17, Figure 18 and Figure 19 show how the hydropower plant has been operated based on the obtained patterns of hourly loading of the generation units in three consecutive years, 2017–2019.

Annex A includes in Table A6 and Table A7 details of the patterns regarding the hourly loading of the generation units and their features regarding the operating conditions in an analyzed three-year period. This task improves the quality of the obtained solutions and the decision-making processes that are involved in the loading of the generation units.

The third task is based on an expert system involved inside the module that analyzes the hourly patterns and operating conditions of the units to determine the optimal loading solution depending on the number of operating hours and the power requested by the system. Figure 20, Figure 21 and Figure 22 present the obtained results for a representative day from 2020.

It can be observed that the loading of the six GUs is different in the developed strategy compared with the experience-based strategy adopted by DEMA. Table 2 presents the differences between the experience-based strategy and the expert system-based developed strategy. The signification of the colors is the following:

Red color is associated with the generation units that have been loaded in the experience-based strategy but not considered in the case of the expert system-based strategy (the sign is “-“).
Blue color is associated with the generation units that have been loaded in the expert-based strategy but not considered in the case of the experience-based strategy (only with the sign “+“).
Green color is associated with the generation units loaded in the expert-based strategy, having the same (value “0”) or having another loading in the expert-based strategy (with signs “+“ or “−“).
Yellow color is associated with the generation units that have not been loaded in either strategy.

The proposed strategy adopted a loading of the first four GUs between 60 and 72% of the rated power with the loading of GU5 and GU6 only for powers required by the system with higher values. These generation units will be available for ancillary services.

Table 3 presents the specific indicators for each GU obtained for operating the HPP in 2020 with the proposed strategy regarding the total operating time, the total energy production, and the average loading.

A comparison between results obtained with the adopted expert system-based strategy in 2020 and those obtained with the experience-based strategy in 2017 revealed an increase in the average loading for all six GUs with 21.5% (GU1), 18.9% (GU2), 16.9% (GU3), 14.8% (GU4), 6.3% (GU5), and 57.7% (GU6). The smallest increase has been observed in the case of GU5, the most elevated has been obtained for GU6, and between 14.8% and 21.5% for the first four GUs. However, the higher loading of UG6 was because it was in maintenance operation for two years and worked for very few hours during that time. The total energy production has increased by 20.8% from 313.6 TWh to 378.7 TWh, although the total operating time of all GUs decreased by 5.1% from 13,653 h to 12,960 h. The years 2018 and 2019 are not considered in our analysis because GU6 stopped in 2018 and had a few hours of operation in 2019.

5. Conclusions

As the world moves toward a more sustainable energy future, the need for more reliable and dispatchable sources of electricity is increasing. An important factor that can help improve the optimal operation of hydropower plants is associated with quick data processing and extracting hidden patterns. Knowledge discovery can represent an efficient tool for addressing various challenges, among which is the optimal operation of hydropower plants.

This paper proposes a framework that combines data mining and knowledge discovery to help the transition from a traditional SCADA system to a smart one in hydropower plants. It will allow the DEMA (operators from the control room) to identify the outliers and implement effective strategies to minimize water consumption and maximize power generation. Based on the advanced statistical tools, the framework will also help them identify the optimal operating conditions for the plant. The performance has been tested in a Romanian hydropower plant using the SCADA database. It allowed the control room operators to obtain comparative information on the plant’s performance over a longer time horizon (in our case study, three years). The results of the tests revealed the utility of a knowledge discovery module in helping the control room operators improve the efficiency of their operations by transitioning toward smart SCADA systems. Thus, a comparison between results obtained with the adopted expert system-based strategy in 2020 and those obtained with the experience-based strategy in 2017 revealed an increase in the average loading at the level of the HPP from 133 MW to 166 MW (representing the sum of powers produced by all six GUs), which means that the HPP was loaded from 64% to 80% of the total installed power of 210 MW. Also, the total energy production has increased by 20.8%, although the total operating time of all GUs decreased by 5.1%.

The future of work is characterized by developing a new task associated with the uncertainty modeling of two variables: the hourly powers requested by the system and the upstream level in the water reservoir. The modeling of the first variable is based on the results obtained with a forecasting method, having the input data rainfall, temperature, and historical values recorded in the SCADA database. The operating regime of the hydro cascade, of which the plant is a part, and the historical data from the other plants represent the factors used in the developed models for the second variable. This new task will be a new component of the Data Mining-based Knowledge Discovery Module to determine quickly the best strategy to load the generation units in uncertain conditions.

Author Contributions

Conceptualization, G.G.; methodology, G.G. and R.G.; software, G.G.; validation, G.G., R.G. and B.-C.N.; formal analysis, R.G. and B.-C.N.; investigation, G.G., R.G. and B.-C.N.; resources, G.G.; data curation, R.G.; writing—original draft preparation, G.G., R.G. and B.-C.N.; writing—review and editing, G.G.; visualization, G.G. and R.G.; supervision, G.G.; project administration, G.G.; funding acquisition, G.G and B.-C.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

DEMA	Decision-Makers
DM	Data Mining
GU	Generation Unit
HPP	Hydropower Plant
KD	Knowledge Discovery
O&M	Operations and Maintenance
SCADA	Supervisory Control and Data Acquisition
RTU	Remote Terminal Units
WF_Pipe_n	Water flows in the pipe n from the hydropower plant HPP, in [m³/s]
Iex_GUn	Excitation current of the generation unit GUn from the hydropower plant HPP, in [V]
Is_GUn	Stator current of the generation unit GUn from the hydropower plant HPP, in [A]
H_year	the number of hours when at least one GU works, in [hours]
m	mean of the data set
N_HPP^GU	the number of generation units from the hydropower plant
P_GU_Pipe_n	Active power produced by the generation units, GU, which are supplied through the pipe n from the hydropower plant HPP, in [MW]
P_GUn	Active power produced by the generation unit GUn from the hydropower plant HPP, in [MW]
P_req	Requested active power of the system to the hydropower plant HPP, in [MW]
Q0	zeroth quartile (minimum value of the data set)
Q1	first quartile (25%)
Q2	second quartile (50%)
Q3	third quartile (75%)
Q_GU_Pipe_n	Reactive power produced by the generation units, GU, which are supplied through the pipe n from the hydropower plant HPP, in [MVAr]
Q_GUn	Reactive power produced by the generation unit GUn from the hydropower plant HPP, in [MVAr]
Q_req	Requested active power of the system to the hydropower plant HPP, in [MVAr]
Vex_GUn	Excitation voltage of the generation unit GUn from the hydropower plant HPP, in [V]
Vs_GUn	Stator voltage of the generation unit GUn from the hydropower plant HPP, in [kV]
WLr_d	Water levels of the reservoir downstream, in [mdMB]
WLr_u	Water levels of the reservoir upstream, in [mdMB]
σ	standard deviation of the dataset

Appendix A

Table A1. The technical parameters of the generation units (GU1–GU6) in the operating regimes of the analyzed HPP within a day recorded in the SCADA database.

Hour	GU1						GU2						GU3
Hour	V^s [kV]	I^s [A]	P [MW]	Q [MVAr]	V^ex [V]	I^ex [A]	V^s [kV]	I^s [A]	P [MW]	Q [MVAr]	V^ex [V]	I^ex [A]	V^s [kV]	I^s [A]	P [MW]	Q [MVAr]	V^ex [V]	I^ex [A]
1	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
2	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
3	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
4	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
5	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
6	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
7	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
8	10.5	1.10	20	1	90	290	10.5	1.10	20	1	100	290	10.5	1.00	19	1	95	290
9	10.5	1.10	20	1	90	290	10.5	1.10	20	1	100	290	10.5	1.00	19	1	95	290
10	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
11	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
12	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
13	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
14	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
15	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
16	10.5	1.10	20	5	110	310	10.5	1.10	20	5	110	300	10.5	1.00	19	5	105	300
17	10.5	1.10	20	5	110	310	0	0	0	0	0	0	10.5	1.00	19	5	105	300
18	10.5	1.10	20	5	110	310	0	0	0	0	0	0	10.5	1.00	19	5	105	300
18	10.5	1.10	20	5	110	310	0	0	0	0	0	0	10.5	1.00	19	5	105	300
20	10.5	1.10	20	5	110	310	0	0	0	0	0	0	10.5	1.00	19	5	105	300
21	10.5	1.10	20	5	110	310	0	0	0	0	0	0	10.5	1.00	19	5	105	300
22	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
23	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
24	10.5	1.10	20	1	90	290	0	0	0	0	0	0	10.5	1.00	19	1	95	290
Hour	GU4						GU5						GU6
Hour	V^s [kV]	I^s [A]	P [MW]	Q [MVAr]	V^ex [V]	I^ex [A]	V^s [kV]	I^s [A]	P [MW]	Q [MVAr]	V^ex [V]	I^ex [A]	V^s [kV]	I^s [A]	P [MW]	Q [MVAr]	V^ex [V]	I^ex [A]
1	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
2	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
3	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
4	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
5	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
6	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
10	10.5	1.00	18	5	100	310	0	0	0	0	0	0	0	0	0	0	0	0
11	10.5	1.00	18	5	100	310	0	0	0	0	0	0	10.5	1.70	30	1	90	300
12	10.5	1.00	18	5	100	310	0	0	0	0	0	0	10.5	1.70	30	1	90	300
13	10.5	1.00	18	5	100	310	0	0	0	0	0	0	10.6	1.70	32	1	95	300
14	10.5	1.00	18	5	100	310	0	0	0	0	0	0	0	0	0	0	0	0
15	10.5	1.00	18	5	100	310	0	0	0	0	0	0	0	0	0	0	0	0
16	10.5	1.00	18	5	100	310	0	0	0	0	0	0	0	0	0	0	0	0
17	0	0	0	0	0	0	10.5	2.20	40	5	120	360	0	0	0	0	0	0
18	0	0	0	0	0	0	10.5	2.20	40	5	120	360	0	0	0	0	0	0
18	0	0	0	0	0	0	10.5	2.20	40	5	120	360	0	0	0	0	0	0
20	0	0	0	0	0	0	10.5	2.20	40	5	120	360	0	0	0	0	0	0
21	0	0	0	0	0	0	10.5	2.20	40	5	120	360	0	0	0	0	0	0
22	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
23	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0
24	0	0	0	0	0	0	10.5	2.20	40	1	110	340	0	0	0	0	0	0

Table A2. The operating regime of Pattern P1 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].

Hour	GU1	GU2	GU3	GU4	GU5	GU6
1	0.00575	0.00334	0.00319	0.00091	0.01470	0.00181
2	0.00494	0.00315	0.00259	0.00092	0.01385	0.00169
3	0.00479	0.00311	0.00332	0.00093	0.01399	0.00134
4	0.00458	0.00310	0.00282	0.00091	0.01260	0.00108
5	0.00451	0.00294	0.00295	0.00073	0.01334	0.00107
6	0.00562	0.00317	0.00323	0.00071	0.01596	0.00104
7	0.00656	0.00455	0.00367	0.00097	0.02124	0.00122
8	0.00651	0.00510	0.00448	0.00119	0.02218	0.00145
9	0.00698	0.00638	0.00507	0.00131	0.02658	0.00157
10	0.00697	0.00640	0.00523	0.00142	0.02702	0.00176
11	0.00690	0.00660	0.00548	0.00169	0.02510	0.00176
12	0.00719	0.00751	0.00519	0.00188	0.02749	0.00177
13	0.00854	0.00720	0.00524	0.00174	0.02658	0.00194
14	0.00910	0.00592	0.00516	0.00166	0.02548	0.00192
15	0.00909	0.00573	0.00482	0.00163	0.02595	0.00168
16	0.00778	0.00536	0.00486	0.00171	0.02440	0.00169
17	0.00683	0.00566	0.00527	0.00138	0.02433	0.00182
18	0.00693	0.00648	0.00480	0.00151	0.02508	0.00202
19	0.00792	0.00619	0.00485	0.00186	0.02556	0.00212
20	0.00825	0.00558	0.00470	0.00215	0.02507	0.00200
21	0.00879	0.00586	0.00469	0.00192	0.02590	0.00185
22	0.00936	0.00549	0.00470	0.00193	0.02723	0.00174
23	0.00729	0.00483	0.00428	0.00190	0.02345	0.00160
24	0.00580	0.00418	0.00366	0.00102	0.01833	0.00161

Table A3. The operating regime of Pattern P2 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].

Hour	GU1	GU2	GU3	GU4	GU5
1	0.00714	0.00723	0.00385	0.00000	0.02189
2	0.00595	0.00156	0.00325	0.00058	0.00867
3	0.00353	0.00180	0.00179	0.00051	0.00342
4	0.00401	0.00123	0.00056	0.00099	0.00091
5	0.00263	0.00123	0.00056	0.00091	0.00202
6	0.00287	0.00274	0.00177	0.00148	0.00597
7	0.00334	0.00352	0.00177	0.00119	0.00477
8	0.00547	0.00317	0.00116	0.00115	0.00240
9	0.00720	0.00184	0.00243	0.00115	0.00764
10	0.00775	0.00155	0.00109	0.00167	0.00311
11	0.00583	0.00298	0.00225	0.00109	0.00772
12	0.00741	0.00053	0.00211	0.00058	0.00998
13	0.00683	0.00294	0.00141	0.00058	0.01011
14	0.00674	0.00498	0.00262	0.00148	0.01545
15	0.00816	0.00514	0.00148	0.00106	0.02402
16	0.01109	0.00683	0.00524	0.00102	0.04126
17	0.01380	0.00830	0.00751	0.00510	0.05050
18	0.02052	0.01275	0.00860	0.00575	0.05867
19	0.01872	0.01230	0.01201	0.00280	0.06564
20	0.01555	0.01080	0.01157	0.00328	0.05483
21	0.01391	0.00992	0.00739	0.00281	0.04916
22	0.01258	0.00719	0.00516	0.00181	0.04508
23	0.00832	0.00541	0.00366	0.00146	0.03031
24	0.00575	0.00512	0.00299	0.00146	0.01821

Table A4. The operating regime of Pattern P3 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].

Hour	GU1	GU2	GU3	GU4	GU5	GU6
1	0.01713	0.00741	0.00655	0.00192	0.01527	0.00062
2	0.01045	0.00646	0.00763	0.00154	0.00945	0.00062
3	0.00950	0.00852	0.00764	0.00099	0.00676	0.00062
4	0.00903	0.00841	0.00805	0.00058	0.00552	0.00062
5	0.00850	0.00765	0.00798	0.00084	0.00623	0.00062
6	0.01580	0.00840	0.00871	0.00144	0.00901	0.00062
7	0.01595	0.00643	0.00792	0.00255	0.01280	0.00095
8	0.01405	0.00548	0.00728	0.00382	0.01031	0.00095
9	0.01227	0.00727	0.01146	0.00465	0.01336	0.00095
10	0.01166	0.00675	0.01059	0.00456	0.01138	0.00095
11	0.01041	0.00494	0.00936	0.00475	0.00521	0.00095
12	0.01072	0.00556	0.00967	0.00458	0.00330	0.00057
13	0.01224	0.00515	0.01046	0.00525	0.00272	0.00058
14	0.01063	0.00528	0.00961	0.00463	0.00108	0.00058
15	0.00989	0.00507	0.00890	0.00402	0.00073	0.00096
16	0.01135	0.00445	0.00857	0.00504	0.00217	0.00066
17	0.01331	0.00520	0.01076	0.00451	0.00385	0.00066
18	0.01266	0.00598	0.00919	0.00431	0.00145	0.00066
19	0.01525	0.00567	0.01098	0.00511	0.00382	0.00064
20	0.01608	0.00567	0.00932	0.00467	0.00467	0.00064
21	0.01984	0.00560	0.01101	0.00747	0.01211	0.00068
22	0.02218	0.00490	0.01453	0.00588	0.01447	0.00068
23	0.02410	0.00390	0.01394	0.00540	0.01314	0.00068
24	0.02336	0.00414	0.01110	0.00468	0.00939	0.00033

Table A5. The operating regime of Pattern P3 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].

Hour	GU1	GU2	GU3	GU4	GU5	GU6
1	0.01024	0.00938	0.00177	0.00000	0.05302	0.00000
2	0.00779	0.00898	0.00170	0.00226	0.05455	0.00000
3	0.00762	0.00958	0.00215	0.00226	0.05302	0.00083
4	0.00750	0.00668	0.00215	0.00000	0.04260	0.00083
5	0.01061	0.00667	0.00215	0.00000	0.04609	0.00000
6	0.01355	0.00796	0.00170	0.00000	0.05410	0.00000
7	0.01236	0.02001	0.00177	0.00000	0.06093	0.00000
8	0.00967	0.00388	0.00045	0.00000	0.02815	0.00000
9	0.00640	0.00420	0.00263	0.00000	0.02356	0.00000
10	0.00629	0.00347	0.00319	0.00080	0.01695	0.00000
11	0.00747	0.00262	0.00416	0.00000	0.00504	0.00000
12	0.00540	0.00275	0.00133	0.00000	0.00239	0.00000
13	0.00413	0.00272	0.00000	0.00000	0.00143	0.00000
14	0.00335	0.00181	0.00000	0.00063	0.00120	0.00000
15	0.00171	0.00235	0.00167	0.00063	0.00251	0.00000
16	0.00221	0.00313	0.00250	0.00000	0.00566	0.00000
17	0.00300	0.00236	0.00246	0.00000	0.00937	0.00000
18	0.00631	0.00512	0.00550	0.00372	0.01772	0.00083
19	0.00807	0.00492	0.00445	0.00362	0.02117	0.00083
20	0.00883	0.00515	0.00445	0.00301	0.02311	0.00083
21	0.01128	0.00748	0.00488	0.00131	0.02587	0.00083
22	0.01156	0.00647	0.00575	0.00060	0.02382	0.00000
23	0.00453	0.00568	0.00595	0.00000	0.01412	0.00000
24	0.00193	0.00529	0.00431	0.00000	0.01227	0.00000

Table A6. The patterns regarding the hourly loading of the generation units GU1–GU6 in an analyzed three-year period.

Pattern	2017						2018						2019
Pattern	GU1	GU2	GU3	GU4	GU5	GU6	GU1	GU2	GU3	GU4	GU5	GU6	GU1	GU2	GU3	GU4	GU5	GU6
P1	0	0	0	0	34	0	0	21	20	20	40	0	20	0	0	0	39	0
P2	16	16	16	16	33	0	18	0	0	0	0	0	19	19	19	18	0	0
P3	17	0	0	0	35	31	20	20	20	18	0	0	20	19	18	0	0	44
P4	15	16	16	0	33	32	20	20	20	20	39	0	0	20	19	18	42	0
P5	0	0	17	0	35	0	19	19	0	18	38	0	20	0	19	0	39	49
P6	18	0	17	17	35	0	19	0	19	18	38	0	19	0	0	18	0	0
P7	0	18	0	0	36	0	0	20	20	18	0	0	20	20	0	19	42	0
P8	0	17	16	16	34	0	0	20	19	0	40	0	0	0	19	18	41	0
P9	17	0	0	17	34	0	19	0	18	18	0	0	18	18	0	17	0	0
P10	0	0	16	16	34	0	19	0	18	0	38	0	20	20	0	0	41	50
P11	17	0	17	0	34	30	0	0	0	0	36	0	20	20	19	19	42	0
P12	17	0	0	0	0	32	19	0	0	0	38	0	0	0	19	0	39	0
P13	17	17	16	0	0	0	0	17	16	17	0	0	0	20	0	18	40	0
P14	0	17	0	16	34	0	0	0	18	17	0	0	20	20	19	0	42	0
P15	18	0	17	0	0	31	0	18	0	0	36	0	20	0	19	19	41	27
P16	18	18	0	17	35	0	19	20	19	0	39	0	19	19	0	0	0	0
P17	16	16	16	0	32	31	19	0	17	17	37	0	0	19	0	18	0	0
P18	0	0	16	0	0	31	19	19	0	18	0	0	20	0	0	18	39	0
P19	0	17	16	15	0	0	0	20	10	18	39	0	0	0	0	0	40	36
P20	18	18	17	17	0	0	0	0	18	0	38	0	19	0	19	18	0	22
P21	0	15	17	17	0	0	0	0	17	17	35	0	0	20	19	18	0	0
P22	17	17	0	0	33	32	0	0	0	17	35	0	0	0	18	18	0	0
P23	0	0	0	15	0	32	19	0	0	18	0	0	19	0	0	0	0	0
P24	17	17	0	0	0	30	19	19	19	0	0	0	0	20	19	0	41	0
P25	17	17	0	17	0	0	17	18	0	0	36	0	0	0	0	18	40	0

Table A7. Comparison between the features of the patterns in an analyzed three-year period.

Patterns	Operating Time									Powers Required by the System [MW]
	Hours			Days			Hours/Day			Minimum Value			Average Value			Maximum Value
	2017	2018	2019	2017	2018	2019	2017	2018	2019	2017	2018	2019	2017	2018	2019	2017	2018	2019
P1	432	471	323	128	35	49	3	13	7	30	75	42	34	102	63	40	107	59
P2	360	235	539	54	81	55	7	3	10	71	15	60	82	18	79	110	22	76
P3	810	908	276	152	66	42	5	14	7	44	64	47	53	78	108	88	85	58
P4	165	424	467	23	36	45	7	12	10	60	90	89	86	119	100	108	129	99
P5	267	166	276	60	27	29	4	6	10	45	75	67	52	94	129	60	105	79
P6	116	288	47	30	40	16	4	7	3	42	75	34	65	95	40	101	107	37
P7	488	137	372	99	11	34	5	12	11	43	45	88	54	58	103	60	64	101
P8	235	313	429	52	40	52	5	8	8	57	60	63	69	78	80	100	84	78
P9	115	165	209	29	45	27	4	4	8	60	30	48	67	46	59	81	64	54
P10	95	397	129	16	59	30	6	7	4	60	60	73	66	75	132	73	84	82
P11	452	425	516	101	112	38	4	4	14	58	30	103	69	36	122	100	44	120
P12	287	806	101	103	118	16	3	7	6	13	45	47	18	57	62	51	64	58
P13	151	119	131	30	46	19	5	3	7	35	0	55	49	19	81	59	39	71
P14	156	91	149	25	33	21	6	3	7	45	15	92	56	23	102	76	42	100
P15	183	418	243	49	67	22	4	6	11	26	45	86	36	55	128	68	64	99
P16	485	159	186	91	34	42	5	5	4	58	78	28	72	96	42	97	106	38
P17	92	216	131	15	39	42	6	6	3	62	60	15	89	73	40	126	84	23
P18	71	198	355	30	48	40	2	4	9	14	30	58	19	45	82	50	61	77
P19	102	71	280	44	17	64	2	4	4	15	60	30	18	76	90	34	90	40
P20	119	201	191	15	41	32	8	5	6	50	45	33	68	56	93	94	63	49
P21	23	88	117	11	16	20	2	6	6	28	60	35	35	69	58	47	84	56
P22	161	101	58	20	31	21	8	3	3	46	45	15	82	53	38	106	63	27
P23	55	52	121	25	16	48	2	3	3	0	30	15	10	37	21	33	41	19
P24	92	172	34	40	33	8	2	5	4	30	39	75	36	55	82	62	65	80
P25	76	293	179	22	58	44	3	5	4	30	60	46	40	71	61	58	85	58

References

International Hydropower Association. World Hydropower Outlook. Opportunities to Advance Net Zero. 2024. Available online: https://www.hydropower.org/publications/2024-world-hydropower-outlook (accessed on 5 September 2024).
Parvez, I.; Shen, J.; Hassan, I.; Zhang, N. Generation of Hydro Energy by Using Data Mining Algorithm for Cascaded Hy-dropower Plant. Energies 2021, 14, 298. [Google Scholar] [CrossRef]
International Energy Agency. Hydropower Special Market Report Analysis and Forecast to 2030. 2021. Available online: https://iea.blob.core.windows.net/assets/83ff8935-62dd-4150-80a8-c5001b740e21/HydropowerSpecialMarketReport.pdf (accessed on 5 September 2024).
Essenfelder, A.H.; Larosa, F.; Broccoli, D.; Mazzoli, P.; Bagli, S.; Luzzi, V.; Mysiak, J.; dalla Vallw, F. Smart Climate Hydro-power Tool: A Machine-Learning Seasonal Forecasting Climate Service to Support Cost–Benefit Analysis of Reservoir Management. Atmosphere 2020, 11, 1305. [Google Scholar] [CrossRef]
Garbea, R.; Scarlatache, F.; Grigoras, G.; Neagu, B.C. Extracting the Operating Characteristics of Hydropower Plants Using a Clustering-based Efficient Methodology. In Proceedings of the IEEE 9th International Conference on Modern Power Systems (MPS), Cluj-Napoca, Romania, 16–17 June 2021. [Google Scholar]
International Finance Corporation. Hydroelectric Power. A Guide for Developers and Investors. 2024. Available online: https://documents1.worldbank.org/curated/en/917841468188335073/pdf/99392-WP-Box393199B-PUBLIC-Hydropower-Report.pdf (accessed on 5 September 2024).
International Renewable Energy Agency. Renewable Energy Technologies: Cost Analysis Series. Hydropower. 2012. Available online: https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2012/RE_Technologies_Cost_Analysis-HYDROPOWER.pdf (accessed on 5 September 2024).
Eker, O.F. Data Science for Industry: Hydropower Condition Monitoring and Predictive Maintenance. 2022. Available online: https://medium.com/@omerfarukeker/data-science-for-industry-hydropower-condition-monitoring-and-predictive-maintenance-49952215fdd7 (accessed on 31 July 2024).
Quaranta, E.; Aggidis, G.; Boes, R.; Comoglio, C.; De Michele, C.; Ritesh Patro, E.; Georgievskaia, E.; Harby, A.; Kougias, I.; Muntean, S.; et al. Assessing the energy potential of modernizing the European hydropower fleet. Energy Convers. Manag. 2021, 246, 114655. [Google Scholar] [CrossRef]
Betti, A.; Crisostomi, E.; Paolinelli, G.; Piazzi, A.; Ruffini, F.; Tucci, F. Condition Monitoring and Predictive Maintenance Methodologies for Hydropower Plants Equipment. Renew. Energy 2021, 171, 246–253. [Google Scholar] [CrossRef]
European Commission. 2050 Long-Term Strategy. Striving to Become the World’s First Climate-Neutral Continent by 2050. Available online: https://climate.ec.europa.eu/eu-action/climate-strategies-targets/2050-long-term-strategy_en (accessed on 5 September 2024).
Ovarro. Five Ways SCADA Systems Can Benefit Sustainability: Although Hidden in the Background, SCADA Systems Are Crucial to Energy Savings. 2024. Available online: https://www.linkedin.com/pulse/five-ways-scada-systems-can-benefit-sustainability-although-hidden-sbdae/ (accessed on 31 July 2024).
Yang, S.; Stempfle, T.; Thiede, S.; Lanza, G. Approach for the Development of a Sustainability-oriented Implementation Strategy of Smart Automation Technologies. Procedia CIRP 2024, 122, 849–854. [Google Scholar] [CrossRef]
Vagnoni, E.; Gezer, D.; Anagnostopoulos, I.; Cavazzini, G.; Doujak, E.; Hočevar, M.; Rudolf, P. The New Role of Sustainable Hydropower in Flexible Energy Systems and its Technical Evolution Through Innovation And Digitalization. Renew. Energy 2024, 230, 120832. [Google Scholar] [CrossRef]
Feng, Z.K.; Niu, W.J.; Zhang, R.; Wang, S.; Cheng, C.-T. Operation Rule Derivation of Hydropower Reservoir by K-Means Clustering Method and Extreme Learning Machine Based on Particle Swarm Optimization. J. Hydrol. 2019, 576, 229–238. [Google Scholar] [CrossRef]
Zhang, F.; Guo, J.; Yuan, F.; Qiu, Y.; Wang, P.; Cheng, F.; Gu, Y. Enhancement Methods of Hydropower Unit Monitoring Data Quality Based on the Hierarchical Density-Based Spatial Clustering of Applications with a Noise–Wasserstein Slim Generative Adversarial Imputation Network with a Gradient Penalty. Sensors 2024, 24, 118. [Google Scholar] [CrossRef] [PubMed]
Luo, W.; Xu, J.; Zhou, Z. Mobile Information Systems, Retracted: Design of Data Classification and Classification Management System for Big Data of Hydropower Enterprises Based on Data Standards, Mobile Information Systems. 2022. Available online: https://onlinelibrary.wiley.com/doi/10.1155/2022/8103897 (accessed on 31 July 2024).
Ahmed, I.; Dagnino, A.; Bongiovi, A.; Ding, Y. Outlier Detection for Hydropower Generation Plant. In Proceedings of the IEEE 14th International Conference on Automation Science and Engineering (CASE), Munich, Germany, 20–24 August 2018. [Google Scholar]
Valencia, A.M.; Caratar, J.; Caicedo, G.; Chamorro, C. Proposal for a KDD-Based Procedure to Obtain a Set of Intelligent Systems Training Applied to the Identification of Failures in Hydroelectric Power Plants. J. Appl. Res. Technol. 2021, 18, 376–389. [Google Scholar] [CrossRef]
Zhang, W.; Ge, Y.; Liu, G.; Qi, W.; Xu, S.; Peng, Z.; Li, Y. Clustering and Decision Tree Based Analysis of Typical Operation Modes of Power Systems. Energy Rep. 2023, 9, 60–69. [Google Scholar] [CrossRef]
Garbea, R.; Scarlatache, F.; Grigoras, G.; Neagu, B.C. Integration of Data Mining Techniques in SCADA System for Optimal Operation of Hydropower Plants. In Proceedings of the IEEE 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania, 1–3 July 2021. [Google Scholar]
Sahin, M.E.; Ozbay Karakus, M. Smart Hydropower Management: Utilizing Machine Learning and Deep Learning Method to Enhance Dam’s Energy Generation Efficiency. Neural Comput. Appl. 2024, 36, 11195–11211. [Google Scholar] [CrossRef]
Shu, X.; Ye, Y. Knowledge Discovery: Methods from Data Mining and Machine Learning. Soc. Sci. Res. 2023, 110, 102817. [Google Scholar] [CrossRef] [PubMed]
Monika; Shauib, M. Implementation Platforms and Strategy for the Knowledge Discovery from the Data. In Proceedings of the International Conference on Computational Modelling, Simulation and Optimization (ICCMSO), Pathum Thani, Thailand, 23–25 December 2022. [Google Scholar]
Ghongade, T.G.; Khobragade, R.N. Evaluation on Utilization and Emaciation of Data Mining Techniques in Information System. In Proceedings of the OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON), Raigarh, Chhattisgarh, India, 8–10 February 2023. [Google Scholar]
Järvinen, P.; Siltanen, P.; Kirschenbaum, A. Data Analytics and Machine Learning. In Big Data in Bioeconomy; Södergård, C., Mildorf, T., Habyarimana, E., Berre, A.J., Fernandes, J.A., Zinke-Wehlmann, C., Eds.; Springer Nature: Cham, Switzerland, 2021; pp. 129–146. [Google Scholar]
Garbea, R.; Grigoras, G. Clustering-Using Data Mining-based Application to Identify the Hourly Loading Patterns of the Generation Units from the Hydropower Plants. In Proceedings of the IEEE International Conference and Exposition on Electrical and Power Engineering (EPE), Iasi, Romania, 20–22 October 2022. [Google Scholar]
Odrynska, A. What is Data Mining: Definition, Process, Techniques and Role in Business Intelligence. 2023. Available online: https://www.alphaservesp.com/blog/what-is-data-mining-definition-process-techniques-and-business-intelligence (accessed on 31 July 2024).
Onlogic. Setting Up Smart SCADA for Digital Transformation. 2024. Available online: https://www.onlogic.com/blog/smart-scada-digital-transformation (accessed on 5 September 2024).
Kaur, S.; Kathpal, N.; Munjal, N. Role of SCADA in Hydro Power Plant Automation. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2015, 4, 8085–8090. [Google Scholar]
Mirzargar, M.; Whitaker, R.T.; Kirby, R.M. Curve Boxplot: Generalization of Boxplot for Ensembles of Curves. IEEE Trans. Vis. Comput. Gr. 2023, 20, 2654–2663. [Google Scholar] [CrossRef] [PubMed]
Chelaru, E.; Grigoras, G. Decision Support System to Determine the Replacement Ranking of the Aged Transformers in Electric Distribution Networks. In Proceedings of the IEEE 12th International Conference on Electronics, Computers and Artificial Intelligence (ECAI) Proceedings, Bucharest, Romania, 25–27 June 2020. [Google Scholar]
Neagu, B.C.; Grigoras, G.; Scarlatache, F. Outliers Discovery from Smart Meters Data Using a Statistical Based Data Mining Approach, In Proceedings of the IEEE 10th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania, 23–25 April 2017.
Wang, Z.; Wang, S.; Zhang, S.; Zhan, J. An Expert System Based on Data Mining for a Trend Diagnosis of Process Parameters. Processes 2023, 11, 3311. [Google Scholar] [CrossRef]
Dandea, V.; Grigoras, G. Expert System Integrating Rule-Based Reasoning to Voltage Control in Photovoltaic-Systems-Rich Low Voltage Electric Distribution Networks: A Review and Results of a Case Study. Appl. Sci. 2023, 13, 6158. [Google Scholar] [CrossRef]
Dunca, G.; Ghergu, C.M.; Rosioru, O.; Bucur, M.D. Analysis of the Areas with Optimal Working of Aggregates in CHE Stejaru, Symposium on Informatics, Automation and Telecommunications in Energy, Sinaia, Romania. 2010. Available online: https://www.researchgate.net/publication/281646497_Analiza_zonelor_cu_functionare_optima_ale_agregatelor_din_CHE_Stejaru (accessed on 31 July 2024). (In Romanian).
Cojoc, G.M. Analysis of The Hydrological Regime of the Bistrita River in the Context of Hydrotechnical Developments; Terra Nostra Publishing House: Iasi, Romania, 2016. [Google Scholar]

Figure 1. The steps of a KD process.

Figure 2. Data Mining techniques.

Figure 3. Interdependencies between the Knowledge Discovery and Smart SCADA.

Figure 4. The basic structure of an automation architecture including the SCADA system.

Figure 5. The multi-task framework integrated into the Knowledge Discovery module.

Figure 6. The fields of the SCADA database.

Figure 7. The hydro arrangement of which the analyzed plant is a part (adapted after [36,37]).

Figure 8. SCADA file associated with a day from the database.

Figure 9. The summary information regarding the operation of the plant regarding the number of hours and the total energy produced by each generation unit.

Figure 10. The boxplots corresponding to the loading of the generation units over a period of one year.

Figure 11. The values corresponding to the level of the water reservoir—downstream ((a) taken from the database containing outliers; (b) after data processing, without outliers).

Figure 12. The boxplots corresponding to the level of the water reservoir—downstream.

Figure 13. The typical operating profile of the hydropower plant assigned to pattern P1.

Figure 14. The typical operating profile of the hydropower plant assigned to pattern P2.

Figure 15. The typical operating profile of the hydropower plant assigned to pattern P3.

Figure 16. The typical operating profile of the hydropower plant assigned to pattern P4.

Figure 17. The patterns obtained for the hourly loading of the generation units in 2017.

Figure 18. The patterns obtained for the hourly loading of the generation units in 2018.

Figure 19. The patterns obtained for the hourly loading of the generation units in 2019.

Figure 20. The active power requested by the system.

Figure 21. The active power distributed among the six GUs—the strategy adopted by the DM without the Knowledge Discovery module.

Figure 22. The active power distributed among the six GUs—the strategy adopted by the DM based on the Knowledge Discovery module.

Table 1. The extracted results from the statistical analysis.

Statistical Parameters		m	σ	Q0	Q1	Q2	Q3	Q4
Water flow	WF_Pipe1 [m³/s]	36.42	9.75	18.70	31.50	34.50	36.30	74.40
	WF_Pipe2 [m³/s]	29.92	14.02	13.40	17.30	30.80	36.60	78.20
	Total [m³/s]	56.14	21.35	10.50	44.50	54.10	70.80	133.20
Active and reactive powers	GU1-GU4 [MW]	29.60	13.54	0.90	18.00	30.00	37.00	78.00
	GU5 [MW]	34.33	3.09	29.00	31.00	35.00	37.00	40.00
	GU6 [MW]	31.51	2.01	1.00	30.00	31.00	33.00	40.00
	Total [MW]	56.34	20.39	13.00	45.00	56.00	72.00	126.00
	Total [MVAr]	6.39	5.90	1.00	2.00	3.00	10.00	30.00
Frequency	[Hz]	49.99	0.02	49.10	49.98	50.00	50.01	50.40
GU 1	Vs [kV]	10.43	0.71	1.40	10.40	10.40	10.50	50.00
	Is [kA]	0.96	0.11	0.08	0.90	0.95	1.00	1.90
	P [MW]	17.20	1.89	0.90	15.00	17.00	18.00	22.00
	Q [Mvar]	2.60	1.99	1.00	1.00	1.00	5.00	11.00
	Vex [V]	88.28	8.80	8.00	80.00	90.00	95.00	110.00
	I ex [A]	287.91	17.29	100.00	280.00	290.00	300.00	360.00
GU 2	Vs [kV]	10.48	0.38	1.10	10.50	10.50	10.50	10.70
	Is [kA]	0.96	0.10	0.70	0.90	0.95	1.05	1.20
	P [MW]	17.35	1.86	14.00	16.00	17.00	19.00	21.00
	Q [Mvar]	2.58	1.99	1.00	1.00	1.00	5.00	21.00
	Vex [V]	89.84	7.80	70.00	85.00	90.00	95.00	110.00
	I ex [A]	289.65	15.01	245.00	280.00	290.00	300.00	320.00
GU 3	Vs [kV]	10.41	0.42	1.60	10.40	10.50	10.50	10.70
	Is [kA]	0.91	0.08	0.70	0.85	0.90	0.95	1.15
	P [MW]	16.48	1.41	13.00	15.00	16.00	17.00	20.00
	Q [Mvar]	2.76	2.00	0.50	1.00	1.00	5.00	10.00
	Vex [V]	89.60	40.80	65.00	80.00	90.00	95.00	870.00
	I ex [A]	289.21	58.37	115.00	280.00	290.00	300.00	2980.00
GU 4	Vs [kV]	10.45	0.09	10.10	10.40	10.50	10.50	10.60
	Is [kA]	0.92	0.08	0.75	0.85	0.90	1.00	1.10
	P [MW]	16.59	1.36	14.00	15.00	17.00	18.00	20.00
	Q [Mvar]	2.78	1.99	1.00	1.00	1.00	5.00	5.00
	Vex [V]	87.38	8.87	60.00	80.00	90.00	95.00	105.00
	I ex [A]	290.13	15.75	260.00	280.00	290.00	300.00	330.00
GU 5	Vs [kV]	10.41	0.52	1.40	10.40	10.50	10.50	10.70
	Is [kA]	1.89	0.17	0.85	1.70	1.90	2.05	2.70
	P [MW]	34.33	3.09	29.00	31.00	35.00	37.00	40.00
	Q [Mvar]	2.56	1.96	1.00	1.00	1.00	5.00	10.00
	Vex [V]	104.78	17.89	10.00	100.00	100.00	110.00	1110.00
	Iex [A]	332.53	108.88	235.00	315.00	330.00	340.00	3210.00
GU 6	Vs [kV]	10.49	0.07	10.30	10.50	10.50	10.50	10.60
	Is [kA]	1.70	0.07	1.50	1.65	1.70	1.75	1.85
	P [MW]	31.52	1.33	29.00	30.00	31.00	33.00	33.00
	Q [Mvar]	2.07	1.77	1.00	1.00	1.00	5.00	5.00
	Vex [V]	94.47	6.15	80.00	90.00	95.00	100.00	110.00
	I ex [A]	303.34	10.77	280.00	300.00	300.00	310.00	340.00
Water level	WLr_u [mdMB]	492.66	5.86	479.28	489.80	493.40	497.99	500.80
Water level	WLr_d [mdMB]	368.21	5.90	356.43	368.90	369.06	369.20	497.48

Table 2. The differences between the experience-based strategy and expert system-based strategy.

Hour	P_GU1 [MW]	P_GU2 [MW]	P_GU3 [MW]	P_GU4 [MW]	P_GU5 [MW]	P_GU6 [MW]
1	+16	0	−16	0	0	0
2	+16	0	−16	0	0	0
3	+16	0	−16	0	0	0
4	+16	0	−16	0	0	0
5	+16	0	−16	0	0	0
6	+16	0	−16	0	0	0
7	+16	0	−16	0	0	0
8	−1	+17	+1	+17	−34	0
9	−1	+17	+1	+17	−34	0
10	−1	+17	+1	+17	−34	0
11	−1	0	−1	0	−34	+36
12	−1	0	−1	0	−34	+36
13	+16	0	−16	0	0	0
14	+16	0	−16	0	0	0
15	+16	0	−16	0	0	0
16	16	0	−16	0	0	0
17	0	0	+2	0	+38	−40
18	0	0	+2	0	+38	−40
19	+16	0	−16	0	0	0
20	+15	+16	0	+15	0	−46
21	+15	+16	0	+15	0	−46
22	+15	+16	0	+15	0	−46
23	+15	+16	0	+15	0	−46
24	+15	+16	0	+15	0	−46

Table 3. The specific indicators for each GU obtained for operating the HPP in 2020.

Generation Unit	GU1	GU2	GU3	GU4	GU5	GU6	HPP
Operating time [hours]	3460	1970	1520	1484	1271	3255	12960
Energy production [TWh]	72.33	40.60	29.3	28.26	46.37	161.85	378.71
Average loading [MW]	21	21	19	19	36	50	166

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Grigoras, G.; Gârbea, R.; Neagu, B.-C. Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules. Appl. Sci. 2024, 14, 8228. https://doi.org/10.3390/app14188228

AMA Style

Grigoras G, Gârbea R, Neagu B-C. Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules. Applied Sciences. 2024; 14(18):8228. https://doi.org/10.3390/app14188228

Chicago/Turabian Style

Grigoras, Gheorghe, Răzvan Gârbea, and Bogdan-Constantin Neagu. 2024. "Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules" Applied Sciences 14, no. 18: 8228. https://doi.org/10.3390/app14188228

APA Style

Grigoras, G., Gârbea, R., & Neagu, B.-C. (2024). Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules. Applied Sciences, 14(18), 8228. https://doi.org/10.3390/app14188228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules

Abstract

1. Introduction

2. Knowledge Discovery and Data Mining in Smart SCADA

2.1. Knowledge Discovery vs. Data Mining

2.2. Data Mining Techniques

3. Multi-Task Framework Integrated into the Knowledge Discovery Module

4. Case Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI