Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues

Zhu, Xinchun; Wu, Yang; Zhao, Xu; Yang, Yunchen; Liu, Shuangquan; Shi, Luyi; Wu, Yelong

doi:10.3390/en17174371

Open AccessReview

Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues

by

Xinchun Zhu

¹,

Yang Wu

¹,

Xu Zhao

¹,

Yunchen Yang

¹,

Shuangquan Liu

^1,*

,

Luyi Shi

² and

Yelong Wu

^3,*

¹

Yunnan Electric Power Dispatching and Control Center, Kunming 650011, China

²

School of Energy and Power Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

³

China-EU Institute for Clean and Renewable, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Authors to whom correspondence should be addressed.

Energies 2024, 17(17), 4371; https://doi.org/10.3390/en17174371

Submission received: 22 July 2024 / Revised: 27 August 2024 / Accepted: 29 August 2024 / Published: 1 September 2024

(This article belongs to the Special Issue Advances in Renewable Energy Power Forecasting and Integration)

Download

Browse Figures

Versions Notes

Abstract

The development in the fields of clean energy, particularly wind and photovoltaic power, generates a large amount of data streams, and how to mine valuable information from these data to improve the efficiency of power generation has become a hot spot of current research. Traditional classification algorithms cannot cope with dynamically changing data streams, so data stream classification techniques are particularly important. The current data stream classification techniques mainly include decision trees, neural networks, Bayesian networks, and other methods, which have been applied to wind power and photovoltaic power data processing in existing research. However, the data drift problem is gradually highlighted due to the dynamic change in data, which significantly impacts the performance of classification algorithms. This paper reviews the latest research on data stream classification technology in wind power and photovoltaic applications. It provides a detailed introduction to the data drift problem in machine learning, which significantly affects algorithm performance. The discussion covers covariate drift, prior probability drift, and concept drift, analyzing their potential impact on the practical deployment of data stream classification methods in wind and photovoltaic power sectors. Finally, by analyzing examples for addressing data drift in energy-system data stream classification, the article highlights the future prospects of data drift research in this field and suggests areas for improvement. Combined with the systematic knowledge of data stream classification techniques and data drift handling presented, it offers valuable insights for future research.

Keywords:

data stream; data drift; wind power; photovoltaics; fault detection

1. Introductions

To address the global warming and climate crisis, humanity has had to reduce the use of fossil fuels and shift towards the development of renewable energy. For instance, considering wind and solar energy, as of 2022, the global installed capacity of wind and solar power has reached 1951.94 GW [1], as shown in the Figure 1. With the development of clean energy power, the amount of data generated in the field of wind and photovoltaic power prediction has shown explosive growth, resulting in massive data streams. How to mine valuable information from the massive data streams is a problem that has received extensive attention [2]. Data streams reflect the real-time information of the data, and compared with traditional classification algorithms, data stream classification has the advantage that the classification model can be continuously adjusted according to the dynamic changes in the data [3]. Classification is an important form of data analysis which predicts the class labels of unknown data from existing data. In traditional classification algorithms, once the classification model is trained, it becomes fixed, and no further adjustments are made. Such a classification model cannot cope with dynamically changing data streams. Data streams are different from traditional static data, which often have the characteristics of unlimited numbers, fast arrival, and concept drift. Therefore, mining knowledge and patterns of interest from data streams requires a brand-new algorithmic framework.

For wind and photovoltaic power generation, to improve the efficiency of power generation, generator sets are often installed in remote suburban areas far from urban centers, resulting in slow detection of generator failures and difficulties in maintaining generator sets. According to statistics, due to manufacturing and installation errors, unit aging, harsh environments, extreme weather, etc., each wind turbine generator set experiences an average downtime of 52 to 237 h per year due to faults, which has a huge impact on the economic and social benefits of wind farms [4]. To ensure the normal operation of wind turbine generators (WTGs) and minimize their downtime, the traditional solution is to create regular inspection plans based on the service life, health condition, and failure probability of WTG components, and then perform regular manual overhauls and maintenance according to these plans. On the one hand, this maintenance method cannot provide timely warnings of turbine failures; on the other hand, it relies heavily on the experience and common sense of the maintenance staff, resulting in an inability to make quick and accurate analyses of turbine failures [5]. A similar problem exists for photovoltaic power generation, seriously hampering its power generation efficiency.

In recent years, the rapid advancement of IoT and big data technologies has enabled the real-time recording and processing of generator operating status indicators through multiple sensors on server clusters. Mining and analyzing massive real-time data to explore the patterns of generator operating status can help to some extent in monitoring the operating status of generators.

In the field of data mining, data stream classification, one of the most important research topics, is crucial for data analysis and pattern recognition, among other purposes. Classification can be applied in various aspects, such as spam filtering, mineral type identification, risk prediction, oil and water layer identification in geology, and so on. Among the many methods used to solve classification problems, each has its advantages and shortcomings. To be precise, no classification algorithm can “perfectly” solve any type of classification problem in all fields. With the development of the big data era, traditional small-sample machine learning methods encountered certain difficulties in dealing with data stream mining tasks. Therefore, algorithms for data stream mining emerged and have been widely used in many fields, including decision trees, neural networks, and Bayesian networks.

With the generation of massive power data, the data drift problem has gained increasing attention in the field of machine learning in recent years and has become one of the main challenges faced by machine learning models in actual production. When the distribution of data changes gradually over time, the distribution of the data to be predicted and the data used for training show obvious shifts, which constitutes the data drift problem. Briefly, data drift can be divided into three categories: covariate drift, prior probability drift, and concept drift. Covariate drift refers to a shift in the data distribution, i.e., the independent variable

P (X)

changes and the class distribution does not change, i.e.,

P (Y | X)

remains unchanged; priori probability drift means that the data distribution does not change, i.e., the independent variable

P (X)

remains unchanged while the class distribution changes, i.e.,

P (Y | X)

changes; while concept drift means that the relationship between the feature and the target variable is shifted, i.e., the data distribution and the class distribution change at the same time, that is,

P_{t} (X, y) \neq P_{t + 1} (X, y),

and more specific definitions will be noted later.

This paper will introduce the research progress in data stream classification and its application in wind power and photovoltaics in the second part; in the third part, it will introduce the current research status of the data stream drift problem; finally, it will make a summary and provide an outlook on the future development of wind power and photovoltaic data streams.

2. Research on Data Stream Classification Methods and Their Applications

2.1. Data Stream Classification Methods

With the development of the big data era, decision trees, neural networks, and Bayesian networks have become commonly used algorithms for data stream mining tasks.

A decision tree [6,7], used for classification traditionally, forms several branches based on the values of the corresponding attributes of each sample, categorizing similar samples under the same branch. Due to its nonparametric nature, the decision tree is less susceptible to the influence of outliers, capable of managing linearly inseparable data, and offers high performance with relatively low computational requirements. However, it is difficult to process high-dimensional data with it, and it is prone to overfitting. A neural network algorithm [8] simulates the structure of neurons to process information, making it suitable for addressing problems characterized by nonlinear relationships. Conventional algorithms are often constrained by assumptions such as normality, linearity, and variable independence. In contrast, neural networks are more versatile and are capable of capturing a diverse array of relationships. Some of the common neural network algorithms are back-propagation neural networks, probabilistic neural networks, and complementary neural networks. The most commonly used is the first one, but it tends to be slower to train than any of the others and can be problematic in very large networks with large amounts of data. The Bayesian network [9] combines network structure with probabilistic statistics and can be used for uncertainty problems, and it is one of the most classical methods in the field of data mining due to its simplicity, efficiency, and probabilistic expressiveness. However, its structure is complex and less efficient, thus the emergence of the plain Bayesian algorithm [10]. The plain Bayesian algorithm assumes strong independence between child nodes. If this assumption holds, this type of classifier converges faster than discriminative models (e.g., logistic regression) and requires less training time. Unlike neural networks or support vector machines (SVMs), it has no parameters to set, which greatly simplifies the plain Bayesian algorithm. However, in practice, it is often necessary to consider the interactions between attributes, which limits the plain Bayesian network classification algorithm.

Numerous researchers have proposed many more effective improvement algorithms in various directions, such as structure extension, instance weighting, attribute selection, and attribute weighting. The current status of their research is described in detail below.

In terms of structural extensions, the most classical model is the tree-extended plain Bayesian model proposed by Geiger [11] in 1992, which has a more complex structure compared to the plain Bayesian model. Friedman et al. [12] studied the tree-extended plain Bayesian model and then simplified the model based on it to achieve a structural relaxation of the independence assumption. Webb et al. [13] proposed an average one-dependent estimation model, which has the advantage that there is no need to learn the topology between attribute nodes. This model directly treats each attribute node as the parent of the remaining nodes, constructs base classifiers for each attribute, and averages all the base classifiers when calculating the classification accuracy, making the algorithm simpler and more efficient.

In terms of instance weighting, the core of improving classification algorithms for plain Bayesian networks lies in the weighting strategy for the instances [14]. Jiang et al. [15] proposed an instance weighting algorithm known as discriminatively weighted plain Bayes, which improves the plain Bayes algorithm by discriminatively weighting the instances. The algorithm evaluates the weights of the instances using classification accuracy and updates the weights of all instances during each iteration. Cai et al. [16] weighted each training instance based on the similarity between the pattern of the training instances and the rest of the training instances. Classifying the updated set of instances greatly improves the efficiency of the algorithm.

Instance cloning can be considered a special case of instance weighting, where each training instance has a different number of clones. How many clones there are depends on the importance of the instance during classification. Zhang et al. [17] proposed a local plain Bayesian algorithm based on instance cloning. The basis of cloning in the algorithm is the similarity between an instance and the test instances in other neighborhoods, and the classifier is constructed on the set of instances expanded by the clones. The similarity of instances in the algorithm is based on various metric functions and does not change the distribution of classes [18].

In terms of attribute selection, the focus of the improvement of the classification algorithm for plain Bayesian networks is on the methods and metrics for selecting subsets of attributes, i.e., how to find the best subset of attributes over the full set of attributes. One of the most classical is the correlation-based approach to attribute selection (CFS) proposed by Langley and Sage [19]. It uses a forward selection method to select a valid subset of attributes and build a classifier on the subset of attributes. Pazzani [20] used backward elimination methods to aid this so that both efficiency and a better subset of attributes can be obtained. Bidi and Elberrichi [21] used an optimization algorithm based on a genetic algorithm to obtain the optimal subset of attributes; the algorithm was validated on a text dataset and achieved higher accuracy with a smaller-sized subset of attributes selected. Dubey and Saxena [22] proposed a clustering-based filtering algorithm for attribute selection in multiclass high-dimensional datasets. The algorithm uses a K-means clustering algorithm to cluster attributes based on cosine similarity and then applies information gain to the best attributes identified. The algorithm can produce higher classification accuracy with a sufficiently small selection of attributes compared to the original information gain-based filtering method. Chuang et al. [23] proposed a binary particle swarm optimization algorithm improved based on the catfish effect. In this algorithm, when the globally optimal particle cannot be improved within a given number of iterations, 10% of the inferior particles are exchanged with the newly generated particles. Experimental results show that this algorithm performs better than the improved hybrid genetic algorithms (HGAs) based on the genetic algorithm proposed by Oh et al. [24]. Unler et al. [25] proposed a particle swarm algorithm that combines the advantages of filtering and packing methods to achieve better classification results.

In attribute weighting, the core idea of the improvement of plain Bayes is to assign different weights according to the importance of attributes for classification. The focus of the attribute-weighted plain Bayesian network classification algorithm is on the attribute-weighting process. Yan et al. [26] proposed a plain Bayesian model based on attribute bi-weighting, using the Normalized Likelihood Allocation (NLA) algorithm attributes to set weights for attributes automatically; tested the effect of each attribute on different classes of labels; and achieved good classification results in multilabel classification. Wu et al. [27] proposed an attribute weighting algorithm based on a differential evolutionary algorithm. The algorithm takes four parameters, maximum iteration, initial population size, mutation, and crossover probability, as factors to measure the weights and updates the weights continuously until the optimal result is achieved. Jiang et al. [28] introduced deep feature weighting for the plain Bayes algorithm. To some extent, it has weakened the attribute independence assumption of plain Bayes. However, they did not consider the frequency distribution. Taheri et al. [29] proposed a new attribute-weighted plain Bayesian classifier by considering the weights of conditional probabilities. Based on the structure and attribute weights of the plain Bayesian classifier, the objective function is modeled and the optimal weights are determined using a local optimization method.

2.2. Application of Data Stream Classification to Wind and Photovoltaic Power Data

2.2.1. Data Streaming in Wind Power

Through the continuous efforts of researchers in recent years, the data stream diagnostic techniques related to wind turbine faults have become more mature. The state indicators of a wind turbine at work, including physical indicators such as temperature, stress, and torque, and electrical indicators such as current, voltage, and power, can reflect its health to a certain extent. Currently, most of the wind turbine fault monitoring systems that have been put into use were developed based on the vibration signals of wind turbines. Simultaneously, fault diagnosis methods based on electrical indicators such as current and power are also receiving more and more attention because of their good applicability and low transformation costs.

Fault Diagnosis Based on Electrical Indicators

There are relatively few domestic studies related to wind turbine fault diagnosis based on electrical indicators, while foreign scholars have conducted a large number of exploratory studies in the field of wind and machine fault diagnosis based on electrical indicators. For example, Kia S.H. et al. [30] used model simulation to analyze the correlation characteristics between current signals and gearbox faults of motor drive systems containing gearboxes and verified the results of simulation analysis with experimental data; Stack J.R. et al. [31] achieved a reliable classification of different faults in bearings by detecting prominent changes in current eigenfrequencies or energy changes over a wide frequency range caused by different defective faults; Gong X et al. [32] carried out an effective study on bearing failure on direct-drive wind turbines based on current signals and obtained significant findings.

In recent years, some scholars in China have also begun to use electrical signals to carry out relevant fault diagnosis of electric motors. Lin Tao et al. [33] constructed an optimization neural network fault diagnosis model based on temperature characteristics and used residual analysis of the temperature indexes to obtain the fault state of gearboxes. Liang Tao et al. [34] monitored the working status of wind turbines by calculating the multivariate kurtosis and multivariate skewness indicators of wind turbine power characteristics. Li et al. [35] analyzed the connection between common faults and fault current characteristics of wind turbines and constructed a fault diagnosis model based on current signals.

Wind turbine fault diagnosis based on electrical indicators does not require the installation of additional sensors and can be directly performed on the generator’s current, voltage, and power, making it highly applicable. However, compared to temperature, vibration, and other physical signals, electrical signals contain relatively weak fault information, often obscured by the generator’s inherent electrical fluctuations and environmental noise. Consequently, extracting the electrical signal fault characteristics and other useful information is more difficult. Therefore, how to effectively extract the fault characteristics in electrical signals is also a current research focus.

2.: Fault Diagnosis Based on Vibration Signals

Compared to wind turbine fault diagnosis based on electrical indicators, researchers at home and abroad have focused more on vibration signals as wind turbine status indicators, aimed at realizing early warning and diagnosis of faults by monitoring the vibration state of wind turbines. As the rolling bearings, gearboxes, blades, and other key components of wind turbines rotate smoothly and periodically during operation, periodic, nonsmooth fault impact signals appear when a fault occurs. These signals allow for the extraction of vibration signal features that characterize the operating state of wind turbines.

Foreign researchers have made considerable attempts in fault diagnosis research based on vibration signals. For example, Caesarendra W. et al. [36] proposed a feature extraction method based on the maximum Lyapunov exponent, which effectively monitors the state of low-speed bearings. Tang B. et al. [37] analyzed the vibration signals after noise reduction to construct effective features and used a flow learning algorithm to achieve the diagnosis of early weak faults in wind turbines. Barszcz T. et al. [38] achieved the diagnosis of gearbox fault types by analyzing the spectral cragginess characteristics of faulty impact signals.

Domestic scholars have also conducted a lot of research on the vibration signals of wind turbines in recent years. Liu Qingqing et al. [39] introduced the double-tree complex wavelet transform into the vibration signal analysis of wind turbine gearboxes, and achieved wind turbine fault diagnosis by calculating the cragginess values of the components in each frequency band; Zhang Xizheng et al. [40] used a genetic algorithm to optimize a back-propagation (BP) neural network and a hybrid algorithm to construct a vibration signal-based fault diagnosis model for wind turbine gearboxes; Guo Dongjie et al. [41] used a BP neural network and an improved wavelet transform to extract the feature quantities from some sub-band signals to diagnose and locate the faults of wind turbines, and this method can effectively determine early faults in wind turbines [41].

Based on the current development status of vibration signal-based wind turbine fault diagnosis, more and more scholars at home and abroad have gradually begun to shift from traditional analysis methods to applying machine learning and artificial intelligence technology to fault diagnosis. The corresponding fault features are extracted from the vibration signals; then, machine learning methods are used for feature fusion, dimensionality reduction, classification, and other operations; finally, fault diagnosis is carried out through pattern recognition methods. Although the wind turbine fault diagnosis technology based on vibration signals has become more and more mature, most of the current wind turbine fault diagnosis methods use SAS 9.4, MATLABr2018a, and other analysis software systems to analyze the offline data stored in databases. However, since the working state of wind turbines in the industrial field is dynamic and variable, developing an efficient and stable online monitoring system will be the next research focus to monitor the working state of turbines in a timely and accurate manner.

2.2.2. Data Streaming in Photovoltaics

Over the last decade, the installed capacity of photovoltaic (PV) power generation has continuously broken through new levels, but the information management of PV power stations, especially the construction of station-related data platforms, has been relatively stagnant. As a result, the power generation cost and the operation and maintenance efficiency of PV power stations are also affected, reflecting the imbalance and insufficiency in the development of the photovoltaic industry. Utilizing photovoltaic data resources to research the power generation performance of photovoltaic equipment is an important topic. These tools can comprehensively monitor the equipment’s performance, identify potential issues early on, and prevent ongoing degradation from affecting the power generation quality of the entire branch. At the same time, PV sampling data can be used to study and analyze the characteristics of module power generation performance, as well as the relationship between power generation and environmental parameters, which plays a crucial role in predicting the module power that should be generated, screening the abnormal performance of components, regulating the fluctuation of the power grid, and improving the quality of photovoltaic power, among other topics. By analyzing component monitoring data over a long time scale, we can grasp the performance degradation of the component equipment in the actual-use environment, more accurately assess the health status of the components after different periods of service, and realize early positioning, proactive overhaul, and replacement of the components that have exceeded the normal rate of degradation.

Currently, the fault diagnosis of PV equipment mainly relies on the abnormal alarms of the input and output sides of the inverter and rarely collects data at the component level, with fault judgments based on component-level data being even more scarce. Although collecting data at the component level will increase costs to some extent, it can help to find and specify the cause and location of faults earlier. The alarm rules that come with the inverter are often based on single-point data information, which lack effective horizontal connections between devices and fail to consider vertical trends over time, often leading to false alarms and omissions. By analyzing on-site data from multiple angles to obtain information on the operating status of components, actively collecting equipment data, conducting proactive analysis, and performing active operation and maintenance, the life cycle of the equipment can be extended and the efficiency of PV power plant operation and maintenance can be improved.

Faults in photovoltaic modules are reflected in power data. By identifying abnormal power in sample modules, potential equipment faults can be initially identified. In the study of related methods for module power anomaly detection relying on field sampling data, Shi Xiaobing [42] carried out more work on power anomalies based on module current data. Focusing, on the one hand, on the differentiation of component working conditions, and on the other hand, on abnormal judgment after the subdivision of component data under different working conditions, aiming at the nonlinearity and nonstationarity of the operation process of PV power generation regarding meteorological conditions, we chose to construct a Long Short-Term Memory (LSTM) prediction model to learn and explore the law of change in current measurement point data and realize the prediction of current measurement point data through the law of change. Based on the excellent prediction model, the predicted value is used as a comparison with the actual measurement value of the current measurement point, and the corresponding threshold value is set to realize the early warning of abnormality. However, due to the limitation of the data source, this work did not input environmental parameters to the network and made judgments based only on the component’s current values, which makes the judgments less credible. Iddrisu et al. [43] pointed out that the 3-sigma principle in statistical probability should not be used only as the basis for anomaly judgment and proposed an algorithm for anomalous data recognition based on the probabilistic power curves of copula theory. The method is suitable for the high proportion of PV power anomaly data, and the algorithm can be used to improve the anomaly judgment criteria and the adaptability of power anomaly judgment. Livani et al. [44] proposed an outlier detection method using spectral clustering for anomalous sample detection, providing enlightening insights into the horizontal linkage between device data for anomaly detection in PV module fault diagnosis. Park et al. [45] proposed the use of Generative Adversarial Networks (GANs) to generate virtual samples, thereby improving the strength of the model’s discrimination of anomalous samples. The bearing and train door system were considered to examine the approach’s capabilities. The data acquired for the normal condition are used to train the GAN, the health is monitored over time using the trained GAN indicator, and the anomaly is successfully detected by identifying a decrease at a point in time. It points out another path to improve the accuracy of anomaly detection models. Usually, related research uses existing samples to train a model and tries to expand the richness of the sample set or subdivide the samples and then make the model judgment. Using GAN to form an adversarial approach makes the anomaly detection model no longer rely on a large number of samples, which is a kind of solution to be further studied. Shang Yongjie [46] analyzed the mathematical model of photovoltaic module power generation, compared the prediction effects of multiple methods on module power, and, based on this, carried out outlier detection and corrected the prediction algorithm to further improve the effectiveness of anomaly detection.

In the fault diagnosis of photovoltaic modules, the fault diagnosis method based on power sampling data, which is derived from the relevant methods of photovoltaic power data analysis, is less difficult to realize, the cost of power station renovation is low, and the value of engineering applications is greater. Lina Zhang [47] proposed a novel PV array fault detection and localization method for solar PV arrays, i.e., the global chunk stepwise approximation method. The algorithm is based on matrix processing and deep mining of data to determine whether there is a fault in a PV module by evaluating the voltage fluctuation, which solves the problem of the huge amount of data. The global block step-by-step approximation algorithm is implemented in MATLAB, simulation verification is carried out on large matrix data of 100 × 100, and the simulation results show that the algorithm can effectively detect fault locations. The method is still essentially outlier detection for power anomaly screening of single-point data, and the discriminative power is low. Pataru et al. [48] proposed a PV array fault diagnosis method using the fast oversampling principal component analysis (OS-PCA) algorithm for detecting and identifying component shadow shading, short-circuit, and disconnection faults in PV arrays. By detecting the current signal of each string, the fast OS-PCA algorithm is used to calculate the anomaly degree of each string, to detect the faulty string; the photovoltaic array engineering model is optimized through the error compensation, and the fault type is identified by analyzing the state of the working point of the array at the time of the fault. The method proposed in this study is based on group string data signaling and does not go deeper into component-level device information. Yan Tianyi [49] designed and realized a system platform for online monitoring and fault diagnosis, which details the system construction, photovoltaic data monitoring methods, inter-module communication methods, etc., but does not elaborate on the methods of fault judgment.

With the development of big data algorithms, research on data stream fault diagnosis based on clustering algorithms [50], data expansion algorithms [51], etc., has been continuously proposed by scholars, making increasing contributions to PV fault diagnosis.

3. Functions for Various Drifts in the Data Stream and Their Forms

First and foremost, it is necessary to clarify that, in this paper, the phenomenon of model performance degradation caused by the time-varying nature of data’s statistical properties in data stream classification problems will be uniformly referred to as data drift. The types of data drift include concept drift, covariate drift, and prior probability drift. From a definitional perspective, the latter two can be considered as specialized cases of concept drift. Therefore, this section will first clarify the concept of data drift and then provide a detailed introduction to concept drift, with a focused analysis of concept drift issues in data stream classification research within energy systems.

3.1. Covariate Drift

Covariate drift is also known as virtual concept drift in some studies, and this definition was first proposed by Shimodaira [52] in 2000, which focuses on the variation in the input quantity,

x

. However, there is some confusion in the definition of this concept in some earlier studies, and so combining the combing and summarizing of definitions by Jose [53], the definition of covariate drift in this paper is shown in Figure 2.

Firstly, covariate drift refers specifically to the change from X to Y. Suppose a mapping

y = f (x)

. When covariate drift occurs,

f

in the mapping

y = f (x)

does not change (see the curve in Figure 2), which suggests that

P^{t r a i n} (Y | X) = P^{t e s t} (Y | X)

is invariant. But

P^{t r a i n} (X) \neq P^{t e s t} (X)

changed significantly, and, ideally, since the mapping did not change, our model would still make correct predictions; for example, the model in the left-hand side of Figure 2 makes exactly the right classification predictions in the right-hand side of Figure 2, the triangles and circles in the figure means different classification and the white and black color represents the new data after covariate drift.

But the reality is more complex, and this drift can be interpreted as a “difference in emphasis”. In this example, our training input distribution,

P^{t r a i n} (X)

, is relatively broad-spectrum, while the test input distribution is restricted to a specific domain,

P^{t e s t} (X)

. When a model trained for a general domain is applied to a specific domain, the test accuracy tends to drop. Alternatively, in some cases, the dataset used for training is relatively limited, so that the algorithm learns incomplete or erroneous decision boundaries, and performance degradation occurs when it is applied to a domain with a wider coverage. Unlike concept drift, however, covariate drift can be addressed by improving the model generalization capabilities.

3.2. Prior Probability Drift

A prior probability drift refers to a change in Y representing the category variable, which in past studies is generally referred to as a change in the class distribution and specifically refers to a change in the category Y concerning the input X, combined with a summary of the definitions of this concept by Quiñonero-Candela [54]. In this paper, the definition of prior probability drift is as follows:

For a mapping,

y = f (x)

, a causal model of the form

P (X, Y)

is assumed to be valid, which can be decomposed into

P (X, Y) = P (X | Y) P (Y)

according to Bayes’ law. When prior probability drift occurs,

P^{t r a i n} (X | Y) = P^{t e s t} (X | Y)

is invariant, which means that the dots representing the data distribution in the plot are unchanged but the f in the mapping changes, which is

P^{t r a i n} (Y) \neq P^{t e s t} (Y)

(the curve in the figure), and changes significantly.

This type of drift can lead to degradation in model performance when it occurs, but it is easier to correct, as it is assumed that

P^{t r a i n} (X | Y) = P^{t e s t} (X | Y)

is unchanged, just that

P^{t r a i n} (Y)

fails, and therefore the model can be corrected for drift by using the known class distribution,

P^{t e s t} (Y)

, of the test input instead. However, the class distribution of the test input is not known in all cases, and when

P^{t e s t} (Y)

is unknown, the solution to prior probability drift is a little more complicated. One can first determine the prior distribution of the valid

P^{t e s t} (Y)

and then compute the posterior distribution of

P^{t e s t} (Y)

based on the covariates in the given model,

P (X | Y)

, and test data. The target of the prediction is then given by calculating the sum of the predictions obtained by weighting each

P^{t e s t} (Y)

with the posterior probability of

P^{t e s t} (Y)

.

3.3. Concept Drift

Concept drift is a phenomenon of data variation where the statistical properties of the target domain change over time in an arbitrary manner, deviating from the source domain data in a non-negligible way. It was originally proposed by Schlimmer et al. [55] to highlight changes in noisy data over time in incremental learning. Liu [56] has shown that concept drift may be caused by changes in hidden variables that cannot be measured directly. Formally, concept drift is defined as Figure 3:

For a given time period, [0, t], there is a set of samples, denoted as

S_{0, t} = {d_{0}, \dots, d_{i}, \dots, d_{t}}

, where

d_{i} = (X_{i}, y_{i})

is an observation (or an instance of the data),

X_{i}

is the feature vector,

y_{i}

is the label, and

S_{0, t}

follows the determined distribution

F_{0, t} (X, y)

. Concept drift occurs at time

t + 1

; then,

F_{0, t} (X, y) \neq F_{t + 1, \infty} (X, y)

, that is,

\exists t : P_{t} (X, y) \neq P_{t + 1} (X, y)

.

3.3.1. Different Forms of Concept Drift

Considering the characteristics of power streaming data, this section describes three forms of concept drift: label drift, feature drift, and instance drift.

Label drift

Considering the consistency of data distribution between the training set and the test set, the labeled data used for defect prediction face a severe class imbalance problem, i.e.,

\frac{y_{d e f e c t}}{(y_{c l e a n} + y_{d e f e c t})} \neq \frac{y_{c l e a n}}{(y_{c l e a n} + y_{d e f e c t})}

[57]. Class-imbalanced labeled data can make the trained defect model gain more knowledge about the majority class while ignoring the minority class. Data instances in the minority class are often the objects of interest for practitioners, such as defect modules in software projects. This may cause the defect model to produce false alarms when judging potential defect modules. Considering the data distribution of the training set and test set for cross-version defect prediction, if their prior probabilities can be obtained by

P (y) \frac{y_{d e f e c t}}{(y_{c l e a n} + y_{d e f e c t})}

, then label drift can be expressed as

P_{D_{i}} (y) \neq P_{D_{i + 1}} (y)

. Label drift will cause a deviation between the knowledge learned by machine learning in the source domain and the knowledge information of the target domain [58]. In a past study, Zhang et al. [59] calculated label drift as shown in the formula below:

ϕ (y_{i} + y_{i + 1}) = \frac{\sum_{k} m i n (n_{i} (y_{k}), n_{i + 1} (y_{k}))}{n}, y_{k} \in (y_{i} \cap y_{i + 1})

where

n_{i} (y_{k})

denotes the number of instances labeled

y_{k}

in

S_{i}

,

k

denotes the number of categories of

y_{i} \cap y_{i + 1}

, and

n

denotes the number of instances in

S_{i}

. This method is suitable for label drift detection between two data blocks in streaming data.

2.: Feature Drift

From a feature perspective, feature drift is caused by changes in code attributes, such as the number of lines of code, cyclomatic complexity, and the number of branching paths. These changes alter the correlations between feature variables and response variables, making them hidden and difficult to detect. If a feature variable has a strong correlation with the response variable, it contributes more significantly to defect predictions. Figure 4 provides a visual representation of the change in the importance of feature variables between versions, with the thickness of the stripes indicating their importance. This figure clearly shows the change in the importance of the variables between versions. Tsymbal et al. [60] pointed out that even if concept drift occurs in the entire dataset, some regions of the feature space remain stable longer than others.

Feature-based concept drift detection is a common approach in previous studies. For example, Alippi et al. [61] proposed a real-time adaptive classifier based on feature changes to detect data drift. Yu [62] addressed concept drift in each feature variable using the Kolmogorov–Smirnov (KS) test in the case of relatively few true labels. Zhang et al. [59] calculated feature drift by evaluating the contribution of feature variables to labels to detect whether there is a significant change between historical and new data blocks as shown in the Equation below:

ϕ (\vec{I V_{i}}, \vec{I V_{i + 1}}) = J a c (\vec{F_{i}}, \vec{F_{i + 1}}) = \frac{\vec{F_{i}} \cap \vec{F_{i + 1}}}{\vec{F_{i}} \cup \vec{F_{i + 1}}}

where

\vec{I V_{i}}

denotes the eigenvector of

X_{i}

and

\vec{F_{i}}

denotes the eigenvector reconstructed by

\vec{I V_{i}}

according to specific conditions. When

ϕ (\vec{I V_{i}}, \vec{I V_{i + 1}})

is smaller than the set threshold, it can be considered that feature drift occurs because of a significant change in the importance of the feature variables related to the categorization in the two data blocks.

3.: Instance Drift

Figure 4b shows an example of the change in the distribution of defective instances for any two versions of a particular project, with triangles and circles representing different categories of instances. Instance drift is a more granular type of data drift than feature drift. Instance drift is more difficult to detect because individual data instances do not carry enough information to infer the overall distribution.

Data distribution-based drift detection methods can directly quantify the severity of concept drift. Such algorithms use a distance function or metric to quantify the dissimilarity between historical and new data distributions. For example, Dasu et al. [63] designed a drift detection technique using Kullback–Leibler divergence (relative entropy). Lu et al. [64] proposed an empirical distance based on competency to represent the difference between two data samples. Kifer et al. [65] used a relaxation of the total variation distance to measure the difference between two data distributions. In addition, locating concept drift regions facilitates drift adaptation. Lu et al. [64] showed that identifying drift regions helps to identify outdated data that conflict with the current concept. Lu et al. [66] solved the concept drift problem by eliminating conflicting instances. Local Drift Degree-based Density Synchronized Drift Adaptation (LDD-DSDA) [56] utilizes drift regions as an instance selection strategy to construct a training set that continuously tracks new concepts. When concept drift is detected, conflicting instances are removed from the instance pool.

Many previous studies have used online error rates to detect whether a new instance of data is a conflicting instance. Gama et al. [67] proposed the Drift Detection Method (DDM) to identify significant increases in the overall online error rate within a time window. Herbold et al. [68] also used the online error rates of nodes within the decision tree to detect drift data in localized regions of the instance space. Because this approach focuses on the accuracy of the learner, it cannot directly measure the severity of instance drift. However, the degree of learning accuracy degradation can be used as an indirect measure to indicate the severity of concept drift.

To clarify the extent of instance drift between versions, the two principles for detecting redundant instances described above (distance-based and error-rate-based) are used to identify unordered instances in data. When the classifier error rate decreases (i.e., accuracy increases) after the detector removes an instance from the training set, the instance is confirmed to be a conflicting instance. On the contrary, it is not. From this, instance drift (

I D

) is obtained by the following equation.

I D = \frac{c o n f l i c t i n s t a n c e s}{a l l i n s t a n c e s} \times 100 %

where

I D

indicates the membership of the defect data from the historical version to the new version. For example, there is a historical version containing 200 instances of defects, 12 of which have a membership probability less than the threshold in the new version of the data distribution, so their ID = 12/200 = 6%.

3.3.2. Concept Drift over Different Time Intervals

In data stream classification, due to the large volume of data, the space capacity required to store the data is far beyond the capacity of the memory; to enable the algorithm to be able to deal with massive data, the sliding window mechanism is generally used, that is, each time only one or several blocks of data are input to the system, only the current window of the block of data processing is completed before starting to deal with the next window of the block of data.

Let the data in the sliding window at time t, which has been trained to obtain the target concept, be M. After time

Δ t

, the sliding window is trained again using the data to obtain the concept N. If M ≠ N, the data stream is said to have undergone concept drift. As shown in Figure 5, concept drift can be categorized into two types according to

Δ t

: when

Δ t

is short, the concept drift that occurs is called abrupt concept drift, and when

Δ t

is long, the concept drift that occurs is called gradual concept drift.

After the occurrence of concept drift, the distribution of data within the sliding window is changed; at this time, the performance of the classifier will decline, and if no corresponding measures are taken, the error rate of the classification results will continue to rise. How to more effectively detect concept drift and take certain measures to better cope with concept drift is an urgent problem that needs to be solved in data stream classification.

3.3.3. Concept Drift Detection Methods

Considering the different characteristics of detection algorithms, four different classes of concept drift detection algorithms are presented in this section: error rate-based concept drift detection methods; window-based concept drift detection methods; data distribution-based concept drift detection methods; and multiple hypothesis testing drift detection methods.

Error Rate-based Concept Drift Detection Method

The error rate-based concept drift detection method is the largest and most widely used of the detection methods, generating a concept drift alarm by detecting the error rate of the recorded online base learner when the obtained change in the error rate is statistically significant. The most famous error rate-based detection method is the concept drift detection method (DDM) proposed by Gama [67]. It is the first algorithm proposed to categorize concept drift alarms into warning levels as well as drift levels. The DDM maintains the error rate of the base classifier and uses the mean and variance of the error rate as statistics for detection. A warning occurs when the value of the quantity is greater than the warning level, and if it continues to increase above the drift level, a concept drift signal is generated. The method works well for abrupt concept drift but is not very sensitive to gradual concept drift, and the DDM has a strong dependence on the performance of the classifier. The Early Concept Drift Detection Method (EDDM) proposed by Baena-Garc et al. [69] is similar to DDM, using warnings as well as a drift level mechanism. The method uses the distance error rate instead of the classifier error rate. The main difference between the two is that EDDM provides good accuracy compared to DDM when both abrupt concept drift, as well as gradual concept drift, occurs. Another method is the Exponentially weighted moving average charts (EWMA) for Concept Drift Detection (ECDD) algorithm proposed by Ross et al. [70] In ECDD, probabilities are computed online and the probability of success or failure is found. The base learner uses training examples to determine the initial classification accuracy, and the estimator detects the expected time between two false-positive samples. ECDD is more suitable for detecting gradual concept drift than sudden concept drift. The Reactive Drift Detection Method (RDDM) proposed by Barros et al. [71] is based on the DDM with some modifications, such as discarding old instances of very long concepts to detect drifts as early as possible. This improves the accuracy of the classifier and increases the sensitivity to detecting progressive concept drifts. A variant of RDDM is the Fuzzy Windowing Drift Detection Method (FW-DDM) [72], which is based on the fuzzy windowing mechanism used for concept drift detection, which identifies examples from different concepts more accurately by sliding the window and including overlapping windows, and detects noise by checking how accurately the instances are represented in both the old concepts and the current concept to detect noise and treating it as noise if the accuracy of both concepts is degraded. Similar error-rate based methods include Heoffding’s inequality-based Drift Detection Method (HDDM) [73], Learning with Local Drift Detection (LLDD) [74], and Dynamic Extreme Learning Machine (DELM) [75], among others.

2.: Window-based Concept Drift Detection Methods

Window-based concept drift detection methods typically maintain two windows of data instances, the first for storing old instances and the second for storing new instances of the data stream. The comparison between these two window instances reacts to changes in the data distribution and signals drift. The window size can be fixed or adaptive; a fixed window means that it has the same size for the entire analysis, while an adaptive window adjusts its size according to drift conditions, shrinking if drift is detected and enlarging if there is no drift condition. The best-known algorithm is the Adaptive Sliding Window Algorithm (ADWIN) proposed by Bifet [76]. It can adaptively adjust the size of the window according to the distribution of the data, and the window size is dynamically increased or decreased based on whether there is no change or a change in the distribution of the data, and a drift alarm occurs when the difference in the average distribution of two consecutive windows is more significant than a predefined threshold. In addition, ADWIN performs well in drift detection because it limits the false-alarm and miss rates. The limitation of ADWIN is that it is only suitable for one-dimensional data, and for multidimensional data it requires window maintenance for each latitude, which requires a lot of time and memory. Bifet et al. [76] proposed a modified version of ADWIN that requires less time and memory. The One-Class Drift Detector (OCDD) proposed by Can et al. [77] is one of the implicit concept drift detectors based on the sliding window mechanism. It approximates the distribution of new concepts, categorizes samples, and estimates whether they belong to the current distribution or are outliers. The percentage of outliers is calculated in real time, and a sliding window uses the percentage of outliers to determine if concept drift has occurred. It can work with any classifier that does not have a drift adaptation mechanism. Another window-based drift detector is Paired Learners (PL), proposed by Bach [78]. It uses two base learners: stable and reactive. The stabilizing learner trains based on previous instances and predicts new instances. The reactive learner learns recent instances and predicts instances in the current window. It uses the difference in accuracy between these two learners to detect concept drift. Other window-based drift detection algorithms are Statistical Test of Equal Proportions Detection (STEPD) [79], the Fast Hoeffding Drift Detection Method (FHDDM) [80], and the Group Drift Detection Method (GDDM) [81], among others.

3.: Concept Drift Detection Method based on Data Distribution

Concept drift detection algorithms based on data distributions achieve concept drift detection by comparing the differences between old or historical data and current data instances. These types of methods are often used in conjunction with window-based methods and analyze statistical significance, where the computation of changes in data distribution yields information about where the drift occurred.

The hybrid forest algorithm proposed by Rad [82] focuses on random forests using Hovdean trees. The method is divided into three phases: in the first phase, a data frame is created to obtain multiple features; in the second phase, a weak learner is generated and learns using some of the features; in the third phase, all the learners are run at the same time. At the end, there is a combining phase to compute the final result, and based on the result of the computation, it is decided whether concept drift has occurred or not. Another typical method is the Information-Theoretic Approach (ITA) proposed by Dasu [63]. The basic idea of the algorithm is to use a k-d tree to partition the old data as well as the new data into a series of boxes. Then, Kullback–Leibler (KL) divergence scattering is used to compute the distinctiveness of the densities in each box, and hypothesis testing on the distinctiveness is finally used to determine whether concept drift has occurred. Similar distribution-based detection algorithms are available Statistical Change Detection (SCD) [83], Competence Model-based drift detection (CM) [66], Equal Density Estimation (EDE), and Local Drift Degree-based Density Synchronized Drift Adaptation (LDD-DSDA) [56].

4.: Multiple Hypothesis Testing Drift Detection Methods

The multiple hypothesis testing detection method uses similar algorithms to the previous three methods. The algorithm is characterized by its use of multiple hypothesis testing to detect concept drift from different paths, i.e., after the algorithm detects data drift, it uses parallel or hierarchical multiple hypothesis testing to repeatedly test the detection results to ensure statistical significance. Compared with the classical detection methods, the multiple hypothesis testing method enhances the reliability and stability of the detection results. Just-In-Time adaptive classifiers (JIT) proposed by Alippi [84] was the first algorithm to apply multiple hypothesis testing methods to detection methods. The algorithm extends the Cumulative Sum (CUSUM) table to detect average changes in the learning system’s features of interest by setting up four parallel constructs of hypothesis tests. A similar strategy is used in Linear Four Rate drift detection (LFR) [85], proposed by Wang. Similarly, a three-tier drift detection algorithm based on Information Value and Jaccard similarity (IV-Jac) [59] was proposed by Zhang. Ensemble of Detectors (e-Detector) [86], the Drift Detection Ensemble (DDE) [87], and the Hierarchical Change-Detection Tests (HCDTs) [88] proposed by Boracch are different from the above algorithms. The innovative use of a strategy employing layered multiple hypothesis testing divides the algorithm into a detection layer and a validation layer, where the detection layer is the existing detection method and the validation layer is an additional hypothesis testing method by which a secondary test for drift is performed. Similarly, the Hierarchical Linear Four Rate (HLFR) [89] proposed by Yu is based on the LFR method proposed above with additional detection layers. In addition, there are Two-Stage Multivariate Shift-Detection based on EWMA [90], Hierarchical Hypothesis Testing with Classification Uncertainty (HHT-CU) [62], and other methods that also use multiple hypothesis testing.

So far, a very large number of concept drift detection methods have appeared at home and abroad as shown in the Table 1. The concept drift detection algorithm based on the error rate has wider applicability and easily deals with multidimensional data, the complexity of the algorithm is generally lower, and it is easier to meet the real-time requirements of streaming data; the detection approach based on a window is based on recent data and does not rely on a learner, but it is more complex for the processing of high-dimensional data; detection based on the distribution of data algorithms starts from the data, mining the data for statistical information or changes in distribution; detection algorithms for multiple hypothesis testing increase the statistical significance of drift results by improving the hypothesis testing method based on existing algorithms but also increase the complexity of the algorithm.

3.3.4. Model Updating Strategies for Addressing Conceptual Drift

When a detection algorithm detects the occurrence of concept drift and issues an alert, the algorithm needs to update the existing model to accommodate the new concept to ensure the performance and accuracy of the learning model. There are generally three broad strategies for updating models: retraining a new learning model to replace the old one; modifying the old model; and fusing multiple old models to construct a new one.

The typical update strategy of retraining a new model to replace the old one is used in the PL algorithm [78] proposed by Stephe. The method pairs a stabilizing online learner with a reactive online learner, where the stabilizing learner makes predictions based on all of its experience and the reactive learner makes predictions based on its experience in a recent short-term time window. The method works by comparing the accuracy of the two learners in the most recent short-term time window; the reactive learner will perform better when concept drift occurs, and when the difference in accuracy reaches a threshold, the stabilizing learner will be replaced based on the target concepts in the reactive learner, and the new stabilizing learner will make predictions based on all its experience from the time of replacement.

A similar scheme is used in the ADWIN [76], but it is optimized in the selection of the window, which is no longer the fixed-size window determined a priori in the old methods, the window size being determined by adaptive computation based on the observed rate of change in the data in the window, and the window grows automatically when the data are stationary and stable to improve the accuracy and shortens when the data change to discard obsolete data. There are also a series of algorithms such as DELM [75] and JIT [84] that follow this update strategy.

Compared to retraining a new model, adaptively adjusting a model to update relevant parts of the functionality in response to changes in the data distribution is a more effective model update strategy when localized data drift occurs. However, this approach is limited to the algorithm used to learn the model itself, requiring the algorithm itself to be able to examine and adapt to each data subregion individually, and, in general, models based on decision tree algorithms can apply this strategy. The Concept-adapting Very Fast Decision Tree (CVFDT) [92] is based on the Very Fast Decision Tree (VFDT) [93]. It has the ability to learn decision trees in high-speed time-varying data streams: when the drift substance occurs and the performance of the old subtree begins to decline, CVFDT generates a spare subtree; when the accuracy of the new subtree is greater than that of the old subtree, it will replace the subtree, and the original outdated subtree will be pruned, so that the algorithm’s instantaneity can be guaranteed while making full use of the old data. Similar decision tree-based algorithms include the Extremely Fast Decision Tree [94], the concept-adapting evolutionary algorithm for decision trees (CEVDT) [95], and the Uncertainty-handling and Concept-adapting Very Fast Decision Tree (UCVFDT) [96].

The update strategy of reusing and aggregating old models can update a model more cost-effectively and efficiently in response to recurring concept drift. The Dynamic Weighted Majority (DWM) [97] proposed by Jeremy is a collection of base learners, each of which is a learning algorithm and each of which has an associated weight of its own. Given an instance, each base learner gives a prediction, and, combined with their respective weights, DWM gives a global prediction. The weighted majority vote of these base learners is reused for prediction, and base learners are dynamically created and removed based on changes in algorithm performance. When the algorithm’s global prediction is wrong, a new learner is added to the set. When the prediction of a base learner is wrong, the DWM down-weights that learner. If it is wrong multiple times and its weight falls below a set threshold, the base learner is deleted. However, in this algorithm, there is only a setting for down-weighting and no setting for up-weighting. A base learner will not be activated again after it is eliminated, and it lacks the consideration of its usage time and prediction history. The Learn++.NSE method [98] proposed by Ryan determines the weight of each classifier’s vote based on its time-adjusted accuracy in the current versus the past environment. For each new batch of data received, the algorithm creates a new classifier and integrates it into the existing set. It then dynamically adjusts the weights based on the performance of the classifiers on recent and past data. Poorly performing classifiers are given a lower weight and are temporarily deactivated when their weights fall below a set threshold. The algorithm does not discard such classifiers altogether; they still participate in each round of prediction. If the deactivated classifiers become relevant and effective again at some time in the future, their weights are increased and they are reactivated. Such a design allows the algorithm to retain historically learned knowledge, enabling it to adapt more efficiently to concept drifts that may recur or change periodically. Over time, however, the accumulation of the number of classifiers in the ensemble can lead to significant consumption of memory and computational resources.

In addition to this, in a recent study, the online Gaussian Mixture Model with a Noise Filter For Handling Virtual and Real Concept Drifts (OGMMF-VRD) [99] proposed by Gustavo considers the impact of the addition of a noise filter on the adaptation and prediction performance of the model with respect to the decision boundary learned by the classifier over time. OGMMF-VRD takes into account the impact of real and virtual drifts on the adaptation and prediction performance of the decision boundaries learned by the classifiers over time and adds a noise filter to avoid model adaptation to problematic observations, and the algorithm keeps all the old Gaussian Mixture Models in a pool, allowing the system to be invoked to select the optimal classifier. The reference of the pool in this method makes it more efficient to utilize the historically learned knowledge than the traditional nonensemble models, and the robustness to virtual warnings of drift detection caused by noisy samples is greater than that of the ensemble models. However, since it is based on a single classifier, it has limitations in terms of performance.

In general, there are various model updating strategies used to cope with concept drift in existing research: retraining new models is the most universal strategy and can be applied to all models; the adaptive modeling strategy can adapt well to concept drift caused by changes in data distribution, but it can only be applied to models based on specific algorithms that can individually check the adaptation to subregions of the data, such as decision trees; the strategy of reusing and aggregating old models has a higher efficiency in dealing with recurring or periodically changing models.

3.4. The Issue of Concept Drift in Data Stream Classification Research within Energy Systems

In energy systems, the data streams from photovoltaic and wind power generation exhibit significant uncertainty due to time-varying environmental conditions and equipment aging. This uncertainty can lead to frequent changes in data distribution, resulting in the problem of concept drift. This uncertainty can lead to frequent changes in data distribution, affecting the mapping relationship between environmental factors and output results, thereby causing the problem of concept drift. With the rapid development of renewable energy, data stream classification methods have been widely applied in the field of energy systems. However, as shown in Section 2, much of the past research has focused solely on the static optimization of algorithms themselves, often overlooking the issue of algorithm performance degradation caused by concept drift in practical applications. However, some recent studies have begun to address this issue. Severiano et al. [100] proposed a new evolving Multivariate Fuzzy Time Series (e-MVFTS) model, which integrates the evolving mechanism of Typicality and Eccentricity Data Analytics (TEDA) into the fuzzy representation of time series used in the FTS model. This integration allows for dynamic adjustment of the fuzzy set definition and the range of membership functions, effectively addressing spatiotemporal information and uncertainty scenarios. The model adapts to changes in data distribution through online updates, effectively handling multivariate high-dimensional data. It has been validated in forecasting tasks for solar and wind energy, demonstrating improved prediction accuracy in renewable energy systems. However, the model’s performance heavily depends on the parameter settings of the clustering algorithm and fuzzy logic. Additionally, in extreme cases where severe concept drift occurs suddenly, the algorithm may fail to promptly eliminate incorrect sets and rules, lacking the necessary pruning mechanism. Zhang et al. [101] proposed a photovoltaic power forecasting model based on LSTM, which employs the traditional DDM method for drift detection. The model uses an orthogonal weight modification approach for the progressive dynamic adjustment of model weights to adapt to new data distributions. This method ensures model stability while avoiding overfitting to historical data. However, as mentioned earlier, DDM is a relatively rudimentary method that is not sensitive to gradual concept drift, and its performance is highly dependent on parameter settings and the classifier’s performance. Improved methods based on DDM, such as EDDM and RDDM, have shown better performance in drift detection. Li et al. [102] proposed the Drift Detection and Adaptation–Light Gradient Boosting Machine (DDA-LightGBM) model for short-term wind power forecasting. The method mainly consists of three steps: extracting sample-related and sequence-related features from weather data, clustering the data based on the extracted features, and developing a new power forecasting model with automatic drift detection and adaptation capabilities for each cluster. For drift detection and adaptation, the study primarily employs a sliding window detection method to identify drift. In terms of the update strategy, the LightGBM model supports incremental learning, allowing the model to be partially updated based on new data following a drift occurrence. This approach helps to reduce the consumption of computational resources. Although the article concludes by demonstrating through the comparison of prediction results in the test set that this method can address the concept drift problem and effectively improve the model’s forecasting performance, several issues remain. The parameters related to the drift detection algorithm and model update strategy in the model, such as the sliding window size, the maximum cache window size, and the sampling rate, are fixed values that have been specifically optimized for particular clusters. These values offer significant performance advantages only in specific scenarios and lack generalizability. If adaptive methods like ADWIN, OCDD, and PL, which can automatically adjust model parameters, were employed, the generalizability of the model presented in this article could be greatly enhanced. There are many other studies related to the concept drift problem in the classification of data streams in energy systems. For instance, Cabello et al. [103] compared the performance impacts of three different drift detection methods—ADWIN, KSWIN, and others—on a linear regression detection model under various parameter settings. Their work successfully demonstrated the importance of drift detection techniques in the field of wind power forecasting. Wu et al. [104] introduces a replacement learning-based online adaptation framework designed for multivariate multistep time-series forecasting in the energy domain, which tackles concept drift by retraining models from scratch. The approach leverages clustering-based sampling and dimensionality reduction techniques to incorporate historical data, ensuring that the model adapts to changing data distributions over time. Additionally, the paper proposes an incremental learning model that updates the existing model incrementally to address concept drift. Lee et al. [105] innovatively addressed the issue of concept drift in photovoltaic power output forecasting by proposing a model-agnostic online forecasting method—Model-Agnostic Online Forecasting (MAOF). In this approach, the problem of forecasting in a batch learning setting is transformed into a forecasting problem in an online learning setting. The online learning algorithm is then applied to adjust the output of the prediction. This method adjusts the model output in real time through online learning algorithms rather than by retraining the model. This makes the method applicable to any forecasting model, independent of the specific model structure.

From the examples above, it is evident that applying existing solutions from the concept drift domain to the relatively emerging field of energy system data stream classification can effectively improve classification efficiency. This represents a significant trend for the future development of this field.

4. Discussion and Conclusions

In this paper, the current mainstream classification methods for wind power/photovoltaic data streams are meticulously discussed, and the classification methods are reviewed and analyzed from multiple perspectives, including structure extension, instance weighting, attribute selection, and attribute weighting. This study analyzes research on data stream techniques for fault diagnosis in wind and photovoltaic systems and provides a detailed review of the application of these methods.

In addition, the drift problem of the data stream is discussed in detail, and a comprehensive review is made for different forms and different time steps of concept drift. Furthermore, the detection methods and model updating strategies proposed by domestic and foreign scholars to cope with the concept drift problem are also discussed. Through the discussion of drift issues in data stream classification research within the energy system domain, several common problems in existing studies on wind power/photovoltaic data stream classification have been identified: generally, most of the current research only focuses on static optimization of models, lacking dynamic optimization to address the issue of data drift. In a few recent studies that have addressed data drift, most have relied on general methods for managing it. Although these studies have effectively highlighted the importance of tackling data drift in this field, they fall short of providing deeper, more comprehensive research. Using the studies outlined in Section 3 as examples, several prevalent issues emerge: models often have inherent limitations, restricting them to specific scenarios; detection mechanisms tend to be overly simplistic, leading to inefficiencies; and update strategies are often inadequate for managing sudden drifts under extreme conditions. These challenges primarily stem from a lack of systematic understanding among researchers in the energy system data stream classification field regarding effective methods for addressing data drift.

This paper, through a detailed analysis of relevant studies, highlights the substantial potential for applying data drift research in the classification of wind power and photovoltaic data streams and provides valuable information about the wind and photovoltaic power data stream classification problem and the corresponding drift problem, establishing a significant link between clean energy production and data processing. It is expected that the relevant countermeasures for the concept drift of data streams will be applied to the field of wind–photovoltaic data stream classification and fault diagnosis to improve the accuracy and reliability of the results. The paper will contribute to improving the efficiency of clean energy generation and help reduce carbon emissions.

Author Contributions

Conceptualization, X.Z. (Xinchun Zhu), Y.W. (Yang Wu), X.Z. (Xu Zhao), Y.Y., S.L., L.S. and Y.W. (Yelong Wu); methodology, X.Z. (Xinchun Zhu); formal analysis, Y.W. (Yang Wu), X.Z. (Xu Zhao), L.S. and Y.W. (Yelong Wu); investigation, Y.Y., L.S. and S.L.; resources, X.Z. (Xinchun Zhu) and Y.W. (Yang Wu); data curation, L.S. and S.L.; writing—original draft preparation, X.Z. (Xinchun Zhu), Y.W. (Yang Wu), L.S. and Y.W. (Yelong Wu); writing—review and editing, X.Z. (Xinchun Zhu), X.Z. (Xu Zhao) and Y.Y.; visualization, S.L., L.S. and Y.W. (Yelong Wu); supervision, X.Z. (Xinchun Zhu) and Y.W. (Yang Wu); project administration, X.Z. (Xinchun Zhu) and Y.W. (Yang Wu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Program of China Southern Power Grid Co., Ltd. (grant number YNKJXM20222173) and the Reserve Talents Program for Middle-Aged and Young Leaders of Disciplines in Science and Technology of Yunnan Province, China (grant number 202105AC160014).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The study was financially supported by the Science and Technology Program of China Southern Power Grid Co., Ltd. (grant number YNKJXM20222173) and the Reserve Talents Program for Middle-Aged and Young Leaders of Disciplines in Science and Technology of Yunnan Province, China (grant number 202105AC160014).

Conflicts of Interest

The authors declare that this study received funding from China Southern Power Grid Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:

ADWIN	Adaptive Sliding Window Algorithm
BP	Back Propagation
CM	Competence Model-Based Drift Detection
CEVDT	Concept-Adapting Evolutionary Algorithm for Decision Tree
CVFDT	Concept-Adapting Very Fast Decision Tree
CFS	Correlation-Based Approach to Attribute Selection
CUSUM	Cumulative Sum
DMM	Drift Detection Method
DELM	Dynamic Extreme Learning Machine
DWM	Dynamic Weighted Majority
EDDM	Early Concept Drift Detection Method
e-Detector	Ensemble of Detectors
EDE	Equal Density Estimation
ECDD	EWMA for Concept Drift Detection
EWMA	Exponentially Weighted Moving Average Charts
FHDDM	Fast Hoeffding Drift Detection Method
FW-DDM	Fuzzy Windowing Drift Detection Method
GAN	Generative Adversarial Networks
HDDM	Heoffding’s Inequality Based Drift Detection Method
HCDTs	Hierarchical Change-Detection Tests
HHT-CU	Hierarchical Hypothesis Testing with Classification Uncertainty
HLFR	Hierarchical Linear Four Rate
HGA	Hybrid Genetic Algorithms
IV-Jac	Information Value and Jaccard Similarity
ITA	Information-Theoretic Approach
ID	Instance Drift
KS	Kolmogorov–Smirnov
KL	Kullback–Leibler
LLDD	Learning with Local Drift Detection
LFR	Linear Four Rate Drift Detection
LDD-DSDA	Local Drift Degree-Based Density Synchronized Drift Adaptation
LSTM	Long Short-Term Memory
NLA	Normalized Likelihood Allocation
OCDD	One-Class Drift Detector
OGMMF-VRD	Online Gaussian Mixture Model with Noise Filter for Handling Virtual and Real Concept Drifts
OS-PCA	Oversampling Principal Component Analysis
PL	Paired Learners
PV	Photovoltaic
RDDM	Reactive Drift Detection Method
SCD	Statistical Change Detection
STEPD	Statistical Test of Equal Proportion Detection
SVMs	Support Vector Machines
UCVFDT	Uncertainty-Handling and Concept-Adapting Very Fast Decision Tree
VFDT	Very Fast Decision Tree
WTGs	Wind Turbine Generators

References

IRENA. Renewable Electricity Capacity and Generation Statistics. 2023. Available online: https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2023/Mar/IRENA_RE_Capacity_Statistics_2023.pdf (accessed on 1 August 2024).
Fu-jiang, A.O.; Zong-feng, Q.I.; Bin, C.; Ke-di, H. Data Streams Mining Techniques and its Application in Simulation System. Comput. Sci. 2009, 36, 116. [Google Scholar]
Krawczyk, B.; Minku, L.L.; Gama, J.; Stefanowski, J.; Wozniak, M. Ensemble learning for data stream analysis: A survey. Inf. Fusion 2017, 37, 132–156. [Google Scholar] [CrossRef]
Hossain, M.L.; Abu-Siada, A.; Muyeen, S.M. Methods for Advanced Wind Turbine Condition Monitoring and Early Diagnosis: A Literature Review. Energies 2018, 11, 1309. [Google Scholar] [CrossRef]
Ozturk, S. Forecasting Wind Turbine Failures and Associated Costs: Investigating Failure Causes, Effects and Criticalities, Modeling Reliability and Predicting Time-to-Failure, Time-to-Repair and Cost of Failures for Wind Turbines Using Reliability Methods and Machine Learning Techniques; ProQuest LLC: Ann Arbor, MI, USA, 2019. [Google Scholar]
Jankowski, D.; Jackowski, K.; Cyganek, B. Learning Decision Trees from Data Streams with Concept Drift. In Proceedings of the 16th Annual International Conference on Computational Science (ICCS), San Diego, CA, USA, 6–8 June 2016; pp. 1682–1691. [Google Scholar]
Rutkowski, L.; Jaworski, M.; Pietruczuk, L.; Duda, P. The CART decision tree for mining data streams. Inf. Sci. 2014, 266, 1–15. [Google Scholar] [CrossRef]
Bodyanskiy, Y.; Vynokurova, O.; Pliss, I.; Setlak, G.; Mulesa, P. Fast Learning Algorithm for Deep Evolving GMDH-SVM Neural Network in Data Stream Mining Tasks. In Proceedings of the 1st IEEE International Conference on Data Stream Mining and Processing (DSMP), Lviv, Ukraine, 23–27 August 2016; pp. 257–262. [Google Scholar]
Borchani, H.; Larrañaga, P.; Gama, J.; Bielza, C. Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers. Intell. Data Anal. 2016, 20, 257–280. [Google Scholar] [CrossRef]
Kruschke, J.K.; Liddell, T.M. Bayesian data analysis for newcomers. Psychon. Bull. Rev. 2018, 25, 155–177. [Google Scholar] [CrossRef]
Geiger, D. An entropy-based learning algorithm of Bayesian conditional trees. In Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence, Stanford, CA, USA, 17–19 July 1992; pp. 92–97. [Google Scholar]
Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
Webb, G.I.; Boughton, J.R.; Wang, Z.H. Not so naive Bayes: Aggregating one-dependence estimators. Mach. Learn. 2005, 58, 5–24. [Google Scholar] [CrossRef]
Bai, Y.; Wang, H.S.; Wu, J.; Zhang, Y.; Jiang, J.; Long, G.D. Evolutionary Lazy Learning for Naive Bayes Classification. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 3124–3129. [Google Scholar]
Jiang, L.X.; Wang, D.H.; Cai, Z.H. Discriminatively Weighted Naive Bayes and Its Application in Text Classification. Int. J. Artif. Intell. Tools 2012, 21, 1250007. [Google Scholar] [CrossRef]
Jiang, L.; Cai, Z.; Wang, D. Improving naive Bayes for classification. Int. J. Comput. Appl. 2010, 32, 328–332. [Google Scholar] [CrossRef]
Jiang, L.X.; Zhang, H.R.; Su, J. Instance cloning local naive Bayes. In Advances in Artificial Intelligence, Proceedings of the 18th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI 2005, Victoria, BC, Canada, 9–11 May 2005; Lecture Notes in Computer Science; Kegl, B., Lapalme, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3501, pp. 280–291. [Google Scholar]
Zhang, Y.S.; Wu, J.; Zhou, C.; Cai, Z.H. Instance cloned extreme learning machine. Pattern Recognit. 2017, 68, 52–65. [Google Scholar] [CrossRef]
Langley, P.; Sage, S. Induction of selective Bayesian classifiers. In Uncertainty Proceedings 1994; Elsevier: Amsterdam, The Netherlands, 1994; pp. 399–406. [Google Scholar]
Domingos, P.; Pazzani, M. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 1997, 29, 103–130. [Google Scholar] [CrossRef]
Bidi, N.; Elberrichi, Z. Feature Selection for Text Classification Using Genetic Algorithms. In Proceedings of the 8th International Conference on Modelling, Identification and Control (ICMIC), Algiers, Algeria, 15–17 November 2016; pp. 806–810. [Google Scholar]
Dubey, V.K.; Saxena, A.K.; Shrivas, M.M. A Cluster-Filter Feature Selection Approach. In Proceedings of the International Conference on ICT in Business Industry and Government (ICTBIG), Indore, India, 18–19 November 2016. [Google Scholar]
Chuang, L.Y.; Tsai, S.W.; Yang, C.H. Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 2011, 38, 12699–12707. [Google Scholar] [CrossRef]
Oh, I.S.; Lee, J.S.; Moon, B.R. Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1424–1437. [Google Scholar] [CrossRef] [PubMed]
Unler, A.; Murat, A.; Chinnam, R.B. mr²PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 2011, 181, 4625–4641. [Google Scholar] [CrossRef]
Yan, X.S.; Wu, Q.H.; Sheng, V.S. A Double Weighted Naive Bayes with Niching Cultural Algorithm for Multi-Label Classification. Int. J. Pattern Recognit. Artif. Intell. 2016, 30, 1650013. [Google Scholar] [CrossRef]
Jia, W.; Zhihua, C. Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (wnb). J. Comput. Inf. Syst. 2011, 7, 1672–1679. [Google Scholar]
Jiang, Q.W.; Wang, W.; Han, X.; Zhang, S.S.; Wang, X.Y.; Wang, C. Deep Feature Weighting in Naive Bayes for Chinese Text Classification. In Proceedings of the 4th IEEE International Conference on Cloud Computing and Intelligence Systems (IEEE CCIS), Beijing, China, 17–19 August 2016; pp. 160–164. [Google Scholar]
Taheri, S.; Yearwood, J.; Mammadov, M.; Seifollahi, S. Attribute weighted Naive Bayes classifier using a local optimization. Neural Comput. Appl. 2014, 24, 995–1002. [Google Scholar] [CrossRef]
Kia, S.H.; Henao, H.; Capolino, G.A. Mechanical Transmission and Torsional Vibration Effects on Induction Machine Stator Current and Torque in Railway Traction Systems. In Proceedings of the IEEE Energy Conversion Congress and Exposition, San Jose, CA, USA, 20–24 September 2009; pp. 2643–2648. [Google Scholar]
Stack, J.R.; Habetler, T.G.; Harley, R.G. Fault classification and fault signature production for rolling element bearings in electric machines. IEEE Trans. Ind. Appl. 2004, 40, 735–739. [Google Scholar] [CrossRef]
Gong, X. Online Nonintrusive Condition Monitoring and Fault Detection for Wind Turbines; ProQuest LLC: Ann Arbor, MI, USA, 2012. [Google Scholar]
Lin, T.; Yang, X.; Cai, R.Q.; Zhang, L.; Liu, G.; Liao, W.Z. Fault diagnosis of wind turbine based on Elman neural network trained by artificial bee colony algorithm. Renew. Energy Resour. 2019, 37, 612–617. [Google Scholar] [CrossRef]
Liang, T.; Zhang, Y.J. Monitoring of wind turbine faults based on wind turbine power curve. Renew. Energy Resour. 2018, 36, 302–308. [Google Scholar] [CrossRef]
Li, Z.Y.; Yu, J.F.; Chen, Y.G.; Wen, D.Z. Research on the Fault Diagnosis Technology for Direct-drive Wind Turbines Based on Characteristic Current. Control Inf. Technol. 2018, 76–80. [Google Scholar] [CrossRef]
Caesarendra, W.; Kosasih, B.; Lieu, A.K.; Moodie, C.A.S. Application of the largest Lyapunov exponent algorithm for feature extraction in low speed slew bearing condition monitoring. Mech. Syst. Signal Process. 2015, 50–51, 116–138. [Google Scholar] [CrossRef]
Tang, B.P.; Song, T.; Li, F.; Deng, L. Fault diagnosis for a wind turbine transmission system based on manifold learning and Shannon wavelet support vector machine. Renew. Energy 2014, 62, 1–9. [Google Scholar] [CrossRef]
Barszcz, T.; Randall, R.B. Application of spectral kurtosis for detection of a tooth crack in the planetary gear of a wind turbine. Mech. Syst. Signal Process. 2009, 23, 1352–1365. [Google Scholar] [CrossRef]
Liu, Q.; Yang, J.; Yin, Z. Fault diagnosis of wind turbine gearbox using dual-tree complex wavelet decomposition. J. Beijing Jiaotong Univ. 2018, 42, 121–125. [Google Scholar]
Zhang, X.; Zheng, L.; Liu, Z. The Fault Diagnosis of Wind Turbine Gearbox Based on Genetic Algorithm to Optimize BP Neural Network. J. Hunan Inst. Eng. (Nat. Sci. Ed.) 2018, 28, 1–6. [Google Scholar] [CrossRef]
Guo, D.; Wang, L.; Guo, H.; Wu, W.; Han, X. Fault Diagnosis of Wind Power Generator Based on Improved Wavelet and BP NN. Proc. Electr. Power Syst. Autom. 2012, 24, 53–58. [Google Scholar]
Shi, X. Anomaly Detection and Early Warning of Photovoltaic Array based on Data Mining; Shandong University: Jinan, China, 2019. [Google Scholar]
Awudu, I.; Wilson, W.; Dahl, B. Hedging strategy for ethanol processing with copula distributions. Energy Econ. 2016, 57, 59–65. [Google Scholar] [CrossRef]
Ahmadi Livani, M.; Abadi, M.; Alikhany, M.; Yadollahzadeh Tabari, M. Outlier detection in wireless sensor networks using distributed principal component analysis. J. AI Data Min. 2013, 1, 1–11. [Google Scholar]
Park, H.J.; Kim, S.; Han, S.Y.; Ham, S.; Park, K.J.; Choi, J.H. Machine Health Assessment Based on an Anomaly Indicator Using a Generative Adversarial Network. Int. J. Precis. Eng. Manuf. 2021, 22, 1113–1124. [Google Scholar] [CrossRef]
Shang, Y. Study on Photovoltaic Power Short-Term Forecast Based on Improved GRNN; Nanjing University of Posts and Telecommunications: Nanjing, China, 2018. [Google Scholar]
Zhang, X. Research on Large-Scale PV Array Power Simulation System and Fault Diagnosis Technology; Qinghai University: Xining, China, 2016. [Google Scholar]
Spataru, S.; Sera, D.; Kerekes, T.; Teodorescu, R. Diagnostic method for photovoltaic systems based on light I–V measurements. Sol. Energy 2015, 119, 29–44. [Google Scholar] [CrossRef]
Yan, T. Development of Fault Monitoring System for Photovoltaic Module in Solar Power Station; Jiangsu University: Zhenjiang, China, 2019. [Google Scholar]
Et-taleby, A.; Chaibi, Y.; Boussetta, M.; Allouhi, A.; Benslimane, M. A novel fault detection technique for PV systems based on the K-means algorithm, coded wireless Orthogonal Frequency Division Multiplexing and thermal image processing techniques. Sol. Energy 2022, 237, 365–376. [Google Scholar] [CrossRef]
Akram, M.W.; Li, G.Q.; Jin, Y.; Chen, X. Failures of Photovoltaic modules and their Detection: A Review. Appl. Energy 2022, 313, 118822. [Google Scholar] [CrossRef]
Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Infer. 2000, 90, 227–244. [Google Scholar] [CrossRef]
Moreno-Torres, J.G.; Raeder, T.; Alaiz-Rodríguez, R.; Chawla, N.V.; Herrera, F. A unifying view on dataset shift in classification. Pattern Recognit. 2012, 45, 521–530. [Google Scholar] [CrossRef]
Joaquin, Q.-C.; Masashi, S.; Anton, S.; Neil, D.L. When Training and Test Sets Are Different: Characterizing Learning Transfer. In Dataset Shift in Machine Learning; MIT Press: Cambridge, MA, USA, 2009; pp. 3–28. [Google Scholar]
Schlimmer, J.C.; Granger, R.H. Incremental learning from noisy data. Mach. Learn. 1986, 1, 317–354. [Google Scholar] [CrossRef]
Liu, A.J.; Song, Y.L.; Zhang, G.Q.; Lu, J. Regional Concept Drift Detection and Density Synchronized Drift Adaptation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; pp. 2280–2286. [Google Scholar]
Tantithamthavorn, C.; Hassan, A.E.; Matsumoto, K. The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models. IEEE Trans. Softw. Eng. 2020, 46, 1200–1219. [Google Scholar] [CrossRef]
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Boston, MA, USA, 2011; pp. 587–605. [Google Scholar]
Zhang, Y.H.; Chu, G.; Li, P.P.; Hu, X.G.; Wu, X.D. Three-layer concept drifting detection in text data streams. Neurocomputing 2017, 260, 393–403. [Google Scholar] [CrossRef]
Thabtah, F.; Hammoud, S.; Kamalov, F.; Gonsalves, A. Data imbalance in classification: Experimental evaluation. Inf. Sci. 2020, 513, 429–441. [Google Scholar] [CrossRef]
Tsymbal, A.; Pechenizkiy, M.; Cunningham, P.; Puuronen, S. Dynamic integration of classifiers for handling concept drift. Inf. Fusion 2008, 9, 56–68. [Google Scholar] [CrossRef]
Yu, S.J.; Wang, X.Y.; Príncipe, J.C. Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 3033–3039. [Google Scholar]
Dasu, T.; Krishnan, S.; Venkatasubramanian, S. An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams; In Interfaces; AT&T Labs: Atlanta, CA, USA, 2006. [Google Scholar]
Lu, N.; Zhang, G.Q.; Lu, J. Concept drift detection via competence models. Artif. Intell. 2014, 209, 11–28. [Google Scholar] [CrossRef]
Kifer, D.; Ben-David, S.; Gehrke, J. Detecting Change in Data Streams. VLDB Endow. 2004, 230, 108–133. [Google Scholar]
Lu, N.; Lu, J.; Zhang, G.Q.; de Mantaras, R.L. A concept drift-tolerant case-base editing technique. Artif. Intell. 2016, 230, 108–133. [Google Scholar] [CrossRef]
Gama, J.; Medas, P.; Castillo, G.; Rodrigues, P. Learning with drift detection. In Advances in Artificial Intelligence—Sbia 2004, Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, 29 September–1 Ocotber 2004; Lecture Notes in Artificial Intelligence; Bazzan, A.L.C., Labidi, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3171, pp. 286–295. [Google Scholar]
Herbold, S.; Trautsch, A.; Grabowski, J. Global vs. local models for cross-project defect prediction A replication study. Empir. Softw. Eng. 2017, 22, 1866–1902. [Google Scholar] [CrossRef]
Baena-Garcıa, M.; Campo-Avila, J.d.; Fidalgo, R.; Bifet, A.; Gavalda, R.; Morales-Bueno, R. Early drift detection method. In Proceedings of the Fourth International Workshop on Knowledge Discovery from Data Streams, Philadelphia, PA, USA, 20 August 2006; pp. 77–86. [Google Scholar]
Ross, G.J.; Adams, N.M.; Tasoulis, D.K.; Hand, D.J. Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit. Lett. 2012, 33, 191–198. [Google Scholar] [CrossRef]
Barros, R.S.M.; Cabral, D.R.L.; Gonçalves, P.M.; Santos, S. RDDM: Reactive drift detection method. Expert Syst. Appl. 2017, 90, 344–355. [Google Scholar] [CrossRef]
Liu, A.J.; Zhang, G.Q.; Lu, J. Fuzzy Time Windowing for Gradual Concept Drift Adaptation. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 9–12 July 2017. [Google Scholar]
Frías-Blanco, I.; del Campo-Avila, J.; Ramos-Jiménez, G.; Morales-Bueno, R.; Ortiz-Díaz, A.; Caballero-Mota, Y. Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds. IEEE Trans. Knowl. Data Eng. 2015, 27, 810–823. [Google Scholar] [CrossRef]
Gama, J.; Castillo, G. Learning with local drift detection. In Advanced Data Mining and Applications, Proceedings of the International Conference on Advanced Data Mining and Applications, Berlin, Heidelberg, 14 August 2006; Lecture Notes in Artificial Intelligence; Li, X., Zaiane, O.R., Li, Z.H., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4093, pp. 42–55. [Google Scholar]
Xu, S.L.; Wang, J.H. Dynamic extreme learning machine for data stream classification. Neurocomputing 2017, 238, 433–449. [Google Scholar] [CrossRef]
Bifet, A.; Gavaldà, R. Learning from Time-Changing Data with Adaptive Windowing. In Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, MN, USA, 26–28 April 2007; Volume 7. [Google Scholar]
Gözuaçik, Ö.; Can, F. Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif. Intell. Rev. 2021, 54, 3725–3747. [Google Scholar] [CrossRef]
Bach, S.H.; Maloof, M.A. Paired Learners for Concept Drift. In Proceedings of the 8th IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 23–32. [Google Scholar]
Nishida, K.; Yamauchi, K. Detecting concept drift using statistical testing. In Proceedings of the 10th International Conference on Discovery Science, Sendai, Japan, 1–4 October 2007; pp. 264–269. [Google Scholar]
Pesaranghader, A.; Viktor, H.; Paquet, E. Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach. Learn. 2018, 107, 1711–1743. [Google Scholar] [CrossRef]
Yu, H.; Liu, W.; Lu, J.; Wen, Y.; Luo, X.; Zhang, G. Detecting group concept drift from multiple data streams. Pattern Recognit. 2023, 134, 109113. [Google Scholar] [CrossRef]
Rad, R.H.; Haeri, M.A. Hybrid forest: A concept drift aware data stream mining algorithm. arXiv 2019, arXiv:1902.03609. [Google Scholar]
Song, X.Y.; Wu, M.X.; Jermaine, C.; Ranka, S. Statistical Change Detection for Multi-Dimensional Data. In Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 667–676. [Google Scholar]
Alippi, C.; Roveri, M. Just-in-time adaptive classifiers—Part I: Detecting nonstationary changes. IEEE Trans. Neural Netw. 2008, 19, 1145–1153. [Google Scholar] [CrossRef]
Wang, H.; Abraham, Z. Concept Drift Detection for Streaming Data. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015. [Google Scholar]
Du, L.; Song, Q.B.; Zhu, L.; Zhu, X.Y. A Selective Detector Ensemble for Concept Drift Detection. Comput. J. 2015, 58, 457–471. [Google Scholar] [CrossRef]
Maciel, B.I.F.; Santos, S.; Barros, R.S.M. A Lightweight Concept Drift Detection Ensemble. In Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, Italy, 9–11 November 2015; pp. 1061–1068. [Google Scholar]
Alippi, C.; Boracchi, G.; Roveri, M. Hierarchical Change-Detection Tests. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 246–258. [Google Scholar] [CrossRef]
Yu, S.J.; Abraham, Z.; Wang, H.; Shah, M.; Wei, Y.T.; Príncipe, J.C. Concept drift detection and adaptation with hierarchical hypothesis testing. J. Frankl. Inst.-Eng. Appl. Math. 2019, 356, 3187–3215. [Google Scholar] [CrossRef]
Raza, H.; Prasad, G.; Li, Y.H. EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recognit. 2015, 48, 659–669. [Google Scholar] [CrossRef]
Feng, G.; Zhang, G.; Jie, L.; Chin-Teng, L. Concept drift detection based on equal density estimation. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 24–30. [Google Scholar]
Hulten, G.; Spencer, L.; Domingos, P. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; pp. 97–106. [Google Scholar]
Domingos, P.; Hulten, G. Mining high-speed data streams. In Proceedings of the Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000; pp. 71–80.
Manapragada, C.; Webb, G.; Salehi, M. Extremely Fast Decision Tree. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1953–1962. [Google Scholar]
Jankowski, D.; Jackowski, K.; Cyganek, B. Learning Decision Trees from Data Streams with Concept Drift. Procedia Comput. Sci. 2016, 80, 1682–1691. [Google Scholar] [CrossRef]
Liang, C.; Zhang, Y.; Shi, P.; Hu, Z. Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Inf. Sci. 2012, 213, 50–67. [Google Scholar] [CrossRef]
Kolter, J.Z.; Maloof, M.A. Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, FL, USA, 19–22 November 2003; pp. 123–130. [Google Scholar]
Elwell, R.; Polikar, R. Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Trans. Neural Netw. 2011, 22, 1517–1531. [Google Scholar] [CrossRef]
Oliveira, G.; Minku, L.L.; Oliveira, A.L.I. Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach. IEEE Trans. Knowl. Data Eng. 2023, 35, 2048–2060. [Google Scholar] [CrossRef]
Severiano, C.A.; Silva, P.C.d.L.e.; Weiss Cohen, M.; Guimarães, F.G. Evolving fuzzy time series for spatio-temporal forecasting in renewable energy systems. Renew. Energy 2021, 171, 764–783. [Google Scholar] [CrossRef]
Zhang, L.; Zhu, J.; Zhang, D.; Liu, Y. An incremental photovoltaic power prediction method considering concept drift and privacy protection. Appl. Energy 2023, 351, 121919. [Google Scholar] [CrossRef]
Li, J.; Yu, H.; Zhang, Z.; Luo, X.; Xie, S. Concept Drift Adaptation by Exploiting Drift Type. ACM J. 2024, 18, 1–12. [Google Scholar] [CrossRef]
Cabello-López, T.; Cañizares-Juan, M.; Carranza-García, M.; Garcia-Gutiérrez, J.; Riquelme, J.C. Concept Drift Detection to Improve Time Series Forecasting of Wind Energy Generation. In Hybrid Artificial Intelligent Systems, Proceedings of the 17th International Conference, HAIS 2022, Salamanca, Spain, 5–7 September 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 133–140. [Google Scholar]
Wu, H.; Elizaveta, D.; Zhadan, A.; Petrosian, O. Forecasting online adaptation methods for energy domain. Eng. Appl. Artif. Intell. 2023, 123, 106499. [Google Scholar] [CrossRef]
Lee, H.; Lee, J.-G.; Kim, N.-W.; Lee, B.-T. Model-agnostic online forecasting for PV power output. IET Renew. Power Gener. 2021, 15, 3539–3551. [Google Scholar] [CrossRef]

Figure 1. Installed solar and wind energy capacity.

Figure 2. An example of covariate drift.

Figure 3. An example of prior probability drift.

Figure 4. (a) An example of feature drift; (b) an example of instance drift.

Figure 5. (a) An example of abrupt drift; (b) An example of gradual drift.

Table 1. Summary of drift detection algorithms.

Category	Algorithms
Error rate-based	DDM [67]
	EDDM [69]
	ECDD [70]
	RDDM [71]
	FW-DDM [72]
	HDDM [73]
	LLDD [74]
	DELM [75]
Window-based	ADWIN [76]
	OCDD [77]
	PL [78]
	STEPD [79]
	FHDDM [80]
	GDDM [81]
Data distribution-based	ITA [63]
	SCD [84]
	CM [56]
	EDE [91]
	LDD-DSDA [66]
Multiple hypothesis testing	JIT [84]
	LFR [85]
	IV-Jac [59]
	e-Detector [86]
	DDE [87]
	HCDTs [87]
	HLFR [89]
	HHT-CU [62]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, X.; Wu, Y.; Zhao, X.; Yang, Y.; Liu, S.; Shi, L.; Wu, Y. Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues. Energies 2024, 17, 4371. https://doi.org/10.3390/en17174371

AMA Style

Zhu X, Wu Y, Zhao X, Yang Y, Liu S, Shi L, Wu Y. Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues. Energies. 2024; 17(17):4371. https://doi.org/10.3390/en17174371

Chicago/Turabian Style

Zhu, Xinchun, Yang Wu, Xu Zhao, Yunchen Yang, Shuangquan Liu, Luyi Shi, and Yelong Wu. 2024. "Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues" Energies 17, no. 17: 4371. https://doi.org/10.3390/en17174371

APA Style

Zhu, X., Wu, Y., Zhao, X., Yang, Y., Liu, S., Shi, L., & Wu, Y. (2024). Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues. Energies, 17(17), 4371. https://doi.org/10.3390/en17174371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Overview of Wind and Photovoltaic Data Stream Classification and Data Drift Issues

Abstract

1. Introductions

2. Research on Data Stream Classification Methods and Their Applications

2.1. Data Stream Classification Methods

2.2. Application of Data Stream Classification to Wind and Photovoltaic Power Data

2.2.1. Data Streaming in Wind Power

2.2.2. Data Streaming in Photovoltaics

3. Functions for Various Drifts in the Data Stream and Their Forms

3.1. Covariate Drift

3.2. Prior Probability Drift

3.3. Concept Drift

3.3.1. Different Forms of Concept Drift

3.3.2. Concept Drift over Different Time Intervals

3.3.3. Concept Drift Detection Methods

3.3.4. Model Updating Strategies for Addressing Conceptual Drift

3.4. The Issue of Concept Drift in Data Stream Classification Research within Energy Systems

4. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI