Robustness of Short-Term Wind Power Forecasting against False Data Injection Attacks

: The accuracy of wind power forecasting depends a great deal on the data quality, which is so susceptible to cybersecurity attacks. In this paper, we study the cybersecurity issue of short-term wind power forecasting. We present one class of data attacks, called false data injection attacks, against wind power deterministic and probabilistic forecasting. We show that any malicious data can be injected to historical data without being discovered by one of the commonly-used anomaly detection techniques. Moreover, we testify that attackers can launch such data attacks even with limited resources. To study the impact of data attacks on the forecasting accuracy, we establish the framework of simulating false data injection attacks using the Monte Carlo method. Then, the robustness of six representative wind power forecasting models is tested. Numerical results on real-world data demonstrate that the support vector machine and k-nearest neighbors combined with kernel density estimator are the most robust deterministic and probabilistic forecasting ones among six representative models, respectively. Nevertheless, none of them can issue accurate forecasts under very strong false data attacks. This presents a serious challenge to the community of wind power forecasting. The challenge is to study robust wind power forecasting models dealing with false data attacks.


Introduction
Traditionally, the power system is seen as a physical system to generate, transmit, and deliver the electricity. With the development of information and communication technology, the cyber system is now playing a more and more important role for situation awareness and security control in the modern power system [1]. However, the malicious cyberattack targeting the Ukrainian electricity grid in 2015 [2] let governments realize for the first time that the failure of the cyber system could also damage the physical power system. Since then, more and more attention has been paid to the cybersecurity issue of the power system [3].
Energy forecasting (including load, electricity price, and renewable generation forecasting) is also a very important component of cyber-physical power systems, especially for modern power systems with high penetration of renewable energy. It is well-known that power system operations heavily rely on very accurate forecasts, such as load and renewable energy forecasting [4][5][6]. Unfortunately, as pointed out by Luo et al. [7] for the first time, energy forecasting is very vulnerable to false data attacks (one type of cyberattacks). This is because the input data quality directly affects the forecast accuracy. Through false data attacks, hackers can inject malicious data into the historical (input) data and then deteriorate the forecast quality significantly. Thus, the cybersecurity issue of energy forecasting is now an important emerging problem for power system research.

1.
A false data injection attack approach against wind power forecasting is first developed where the attacker can inject any amount of malicious data into wind data without being detected by the least-squares-based anomaly detection tool.

2.
The Monte Carlo simulation framework is established to simulate false data injection attacks on wind power data and meteorological data. The Monte Carlo simulation framework can be utilized to evaluate the robustness of any wind power forecasting models.
Energies 2020, 13, 3780 3 of 22 3. It benchmarks the accuracy of six representative wind power forecasting approaches (including three deterministic ones and three probabilistic ones) under different attack intensities and different attack targets (including wind power data and meteorological data).
To the best of our knowledge, this is the first effort to study the cybersecurity issue of wind power forecasting and systematically evaluate the robustness of some wind power forecasting models under false data attacks.
The remainder of this paper is organized as follows. Section 1 gives a brief introduction about the architecture of the wind energy management system and also proposes potential false data attack scenarios. Section 3 presents the principles of false data injection attacks. Section 4 reviews six representative short-term wind power forecasting models. Section 5 provides a robustness assessment framework of the wind power forecasting approach. Section 6 gives the data and all model setups. Section 7 shows numerical results of the comparative study. Finally, this paper is concluded in Section 8.

Cyber Attack Scenarios on Wind Energy Management System
The information and communication technology (ICT) is of crucial importance for wind farm management [36]. The supervisory control and data acquisition (SCADA) system and energy management system (EMS) are two typical ICT services for monitoring and operating wind farms [37]. In this section, we review the architecture of the wind farm SCADA/EMS system. After that, the cyber security for the wind farm SCADA/EMS system is studied. Credible false data attack scenarios against the wind farm SCADA/EMS system are developed consequently.

Architecture of Wind Farm SCADA/EMS System
The typical architecture of the wind farm SCADA/EMS system was introduced in Yan et al. [38] and Zhang et al. [39] In this section, we first review the architecture of the SCADA system installed in every wind farm. Then, we review the EMS architecture of integrating and managing multiple wind farms. Figure 1 shows the representative architecture of the SCADA system installed in every wind farm. This system is used to monitor and operate all wind turbines in the farm. SCADA systems have the ability to communicate with any devices in the farm. Wind turbine is equipped with a control panel, known as the wind turbine control panel (WTCP). In the control room, all data received from meters or outside (such as meteorological data) are stored in the database and transmitted to the application server. Wind farm operators can monitor the status of all wind turbines through the workstation.
Energies 2019, 12, x FOR PEER REVIEW 3 of 21 3. It benchmarks the accuracy of six representative wind power forecasting approaches (including three deterministic ones and three probabilistic ones) under different attack intensities and different attack targets (including wind power data and meteorological data).
To the best of our knowledge, this is the first effort to study the cybersecurity issue of wind power forecasting and systematically evaluate the robustness of some wind power forecasting models under false data attacks.
The remainder of this paper is organized as follows. Section 1 gives a brief introduction about the architecture of the wind energy management system and also proposes potential false data attack scenarios. Section 3 presents the principles of false data injection attacks. Sections 4 reviews six representative short-term wind power forecasting models. Section 5 provides a robustness assessment framework of the wind power forecasting approach. Section 6 gives the data and all model setups. Section 7 shows numerical results of the comparative study. Finally, this paper is concluded in Section 8.

Cyber Attack Scenarios on Wind Energy Management System
The information and communication technology (ICT) is of crucial importance for wind farm management [36]. The supervisory control and data acquisition (SCADA) system and energy management system (EMS) are two typical ICT services for monitoring and operating wind farms [37]. In this section, we review the architecture of the wind farm SCADA/EMS system. After that, the cyber security for the wind farm SCADA/EMS system is studied. Credible false data attack scenarios against the wind farm SCADA/EMS system are developed consequently.

Architecture of Wind Farm SCADA/EMS System
The typical architecture of the wind farm SCADA/EMS system was introduced in Yan et al. [38] and Zhang et al. [39] In this section, we first review the architecture of the SCADA system installed in every wind farm. Then, we review the EMS architecture of integrating and managing multiple wind farms. Figure 1 shows the representative architecture of the SCADA system installed in every wind farm. This system is used to monitor and operate all wind turbines in the farm. SCADA systems have the ability to communicate with any devices in the farm. Wind turbine is equipped with a control panel, known as the wind turbine control panel (WTCP). In the control room, all data received from meters or outside (such as meteorological data) are stored in the database and transmitted to the application server. Wind farm operators can monitor the status of all wind turbines through the workstation. On the other hand, some wind power companies run multiple wind farms in different locations. Thus, they need an EMS system to integrate and manage multiple wind farms. Figure 2 shows the representative architecture of the EMS system. All wind farms are integrated into the control center via a control wide area network (WAN). Fiber optics are usually chosen as communication On the other hand, some wind power companies run multiple wind farms in different locations. Thus, they need an EMS system to integrate and manage multiple wind farms. Figure 2 shows the representative architecture of the EMS system. All wind farms are integrated into the control center via a control wide area network (WAN). Fiber optics are usually chosen as communication infrastructures infrastructures to build this control WAN. In this case, the control room in each farm might not be staffed. In the control center, there are several operator consoles to display the information received from all SCADA servers. All data are stored in the historian database.

Figure 2.
Representative architecture of wind energy management system (EMS) system.
Wind power forecasting tools are usually deployed in the application server of the SCADA or EMS system. If this tool is run in the SCADA system, it only provides the forecasting service for the target wind farm. However, if this tool is deployed in the EMS system, it can provide the forecasting service for all wind farms of one company. Data in the application server or data storage are utilized to train wind power forecasting models and issue the forecasting result.

Credible False Data Attack Scenarios
Vulnerabilities of the wind farm SCADA/EMS system are identified in this subsection. Then, multiple credible false data attack scenarios against the wind farm SCADA/EMS system are developed consequently [40,41].

Scenario I: Attack on WTCP
The WTCP is mainly used by maintenance staff to get the operating status and data of wind turbines [38]. The WTCP is usually installed on the tower base and thus it is accessible for attackers. Although the WTCP is usually authorized by a pin, the attacker can still crack the pin by the brute force searching [38] or chemical combination attack [42]. After that, the attacker is able to connect intrusion devices with the WTCP. Malicious data injected into the WTCP would be sent to the SCADA/EMS system, impacting the deployed wind power forecasting service.

Scenario II: Attack on Optical Fiber Cables
Optical fiber cables are usually used as communication links in the SCADA/EMS system [38]. The communication medium of optical fiber is glass or plastic allowing light propagations. Optical fibers can be attacked by advanced tapping methods without being detected [43]. The attacker is able to inject additional light and deduce the underlying signal by installing taps on optical fiber cables. Since the attacker can easily get physical access to fiber cables, this attack presents a very high risk for data integrity in the SCADA/EMS system. Thus, attacks on fiber cables have great damage on wind power forecasting accuracy.

Scenario III: Attack on SCADA/EMS Servers
Servers are used to communicate, store data, and deploy applications in the SCADA/EMS system. However, servers could be impacted by internal attacks [38]. For example, a corrupted but authorized user has physical access to all servers, and then this user can inject malicious data into the Wind power forecasting tools are usually deployed in the application server of the SCADA or EMS system. If this tool is run in the SCADA system, it only provides the forecasting service for the target wind farm. However, if this tool is deployed in the EMS system, it can provide the forecasting service for all wind farms of one company. Data in the application server or data storage are utilized to train wind power forecasting models and issue the forecasting result.

Credible False Data Attack Scenarios
Vulnerabilities of the wind farm SCADA/EMS system are identified in this subsection. Then, multiple credible false data attack scenarios against the wind farm SCADA/EMS system are developed consequently [40,41].

Scenario I: Attack on WTCP
The WTCP is mainly used by maintenance staff to get the operating status and data of wind turbines [38]. The WTCP is usually installed on the tower base and thus it is accessible for attackers. Although the WTCP is usually authorized by a pin, the attacker can still crack the pin by the brute force searching [38] or chemical combination attack [42]. After that, the attacker is able to connect intrusion devices with the WTCP. Malicious data injected into the WTCP would be sent to the SCADA/EMS system, impacting the deployed wind power forecasting service.

Scenario II: Attack on Optical Fiber Cables
Optical fiber cables are usually used as communication links in the SCADA/EMS system [38]. The communication medium of optical fiber is glass or plastic allowing light propagations. Optical fibers can be attacked by advanced tapping methods without being detected [43]. The attacker is able to inject additional light and deduce the underlying signal by installing taps on optical fiber cables. Since the attacker can easily get physical access to fiber cables, this attack presents a very high risk for data integrity in the SCADA/EMS system. Thus, attacks on fiber cables have great damage on wind power forecasting accuracy. Servers are used to communicate, store data, and deploy applications in the SCADA/EMS system. However, servers could be impacted by internal attacks [38]. For example, a corrupted but authorized user has physical access to all servers, and then this user can inject malicious data into the historical database. On the other hand, servers can be attacked by infected portable storage devices [38]. Malwares, such as spyware and Stuxnet, are introduced into servers. Then, infected servers will be controlled by attackers and inject malicious data into the historical database.

False Data Injection Attack against Wind Power Forecasting
In this section, we present one data attack mode, called false data injection attacks, against short-term wind power forecasting. First, we introduce one of the anomaly detection techniques commonly used in data analysis, which can also be applied to data pre-processing in wind power forecasting. Then, the false data injection attack approach is developed, demonstrating that the attacker can inject any amount of malicious data which may damage wind power forecasting without being detected by the anomaly detection technique. Note that the anomaly detection technique and the false data injection attack approach are both generic, not limited to short-term wind power forecasting. Finally, how to implement such a false data attack on wind power data and meteorological data is presented, respectively.

Least-Squares-Based Anomaly Detection Technique
From the perspective of data analysis, outliers are associated with abnormal observations as they are supposed to be deviations from normal behavior. In wind power forecasting, outliers can be caused by several reasons, such as measurement errors. Outliers seriously affect the accuracy of wind power forecasting. So, some anomaly detection techniques have been proposed to detect and remove outliers from the original dataset.
The three sigma method was first proposed in [44] to reject outliers in wind power time series data. The abnormal data greater than three times standard deviations from wind power curves were identified as outliers. This idea was then further developed in [45] to consider probabilistic wind power curves. In addition to data-driven methods [44,45], image-driven methods based on wind power curve images were proposed in [46] to identify and clean the abnormal data of wind turbine. Spatial correlation of multiple adjacent wind farms was exploited in [47] to detect and recovery the outliers of one wind farm. The density-based spatial clustering method was proposed in [48] to eliminate the outliers caused by wind curtailment.
The most commonly-used anomaly detection techniques by the power industry are based on regression models [8,[49][50][51][52]. The main idea of regression-based anomaly detection methods is to first backcast the data using regression methods and then compare the fitted values with the original ones. If the deviation is larger than a given threshold, the corresponding observation is detected as the outlier. Up to now, different regression methods, such as linear regression [51,52], dynamic regression [8], and non-parametric regression [49,50], have been proposed for different anomaly detection applications.
In this paper, we introduce one of the regression-based anomaly detection techniques commonly used by the power industry. This is referred to as the outlier detection technique based on least squares. For any time series forecasting problems, the original dataset is generally made up of two parts, i.e., input variables T ∈ R n and the output variable y t ∈ R. n is the number of input variables. The pair (x t , y t ) is called the t-th example. The original dataset is a list of m examples, i.e., (x t , y t ); t = 1, 2, . . . , m .
Using least squares regression, the output variable y ∈ R is approximated by a linear function of input variables x ∈ R n : where θ ∈ R n is an n-dimensional parameter. Given the original dataset, the parameter θ is estimated by minimizing the least-squares cost function, which can be formalized as follows: (2) The above optimization problem (2) can be rewritten as a matrix form. First, we define the design matrix X ∈ R m×n as a m-by-n stacked matrix. X contains all examples' input variables x t in its rows. In the same way, we define the design vector y ∈ R m containing all examples' output variable y t : Then, we can easily verify that the problem (2) is equivalent to the following problem: The solution of θ for the above optimization problem (4) is given in the closed form as follows: Substitutingθ for θ in (1), we can then obtain the estimated valueŷ = Xθ. Intuitively, for the original dataset without outliers, its estimated valueŷ is usually close to its observed value y. While for the dataset with outliers, its estimated value is far away from its observed value. Following this idea, the observation residual (i.e., the difference between observed value and estimated value) is defined as follows: Its L 2 -norm y − Xθ is used to detect whether outliers exist or not. Specifically, given a threshold τ (considering the acceptable level of observation residuals), outliers exist in the original dataset when the L 2 -norm of residuals is larger than τ (i.e., y − Xθ > τ). How to choose the proper threshold τ is also very important, but not in the scope of this paper.

False Data Injection Attack Approach
In this part, we follow the idea of [53] and develop an alternative approach of false data injection attack against wind power forecasting. We also show that such attack mode cannot be detected successfully by the anomaly detection technique introduced in the previous part. Furthermore, we show how to launch the data injection attack if the attacker only has very limited resource. Here, "limited resource" means the resource, such as computing resource, communicating resource, and human resource, which are required by attackers to launch a data injection attack successfully.
It is assumed that the attacker has local information of the design matrix X of the target wind farm using the approaches shown in Section 2.2. Note that this assumption does not require the attacker to know the full knowledge about the design matrix X. Even if the attacker only knows local information of the design matrix X, e.g., some input features, the attacker can still launch the successful false data attack, as indicated by Theorem 1. Actually, for short-term wind power forecasting problems, the design matrix X is usually made up of numerical weather prediction (NWP) and calendar information. Both of them can be publicly accessible. For example, NWP results can be purchased from meteorological agencies or calculated by using computational fluid dynamics.
Then, the attacker can inject malicious data into the original dataset to affect the quality of short-term wind power forecasting. The attacking target is assumed to be the output variable y. Let ε represent the vector of malicious data injected to the original dataset (also known as the attack vector). The attacker can choose any non-zero attack vector ε and then replace the original data y by the malicious data as follows: As discussed in the previous part, the least-squares-based anomaly detection technique computes the L 2 -norm of the observation residual and then checks whether outliers exist or not. But, as indicated by Theorem 1 (similar to Theorem 3.1 in [53]), it can be found that such anomaly detection tool cannot detect the false data attack behavior if the attack vector ε is a linear combination of column vectors of the design matrix X. The proof of Theorem 1 can be found in the Appendix A.
Theorem 1. Given that the original data y can successfully pass the least-squares-based anomaly detection tool, this means that y − Xθ ≤ τ. The malicious data y ε = y + ε can also pass the anomaly detection tool if the attack vector ε is a linear combination of column vectors of the design matrix X, i.e., ε = Xδ, where δ ∈ R n is a non-zero arbitrary vector.

Remark 1.
According to Theorem 1, the attacker can successfully construct an effective attack vector ε, even though the attacker only has local information of the design matrix X. For example, given that the attacker only knows the second and third columns of the design matrix, i.e.,X = {x 2 , x 3 }, then the attack vectorε can be constructed asε = δ 2 x 2 + δ 3 x 3 , where δ 2 and δ 3 are two non-zero arbitrary values.
On the other hand, the attacker may have limited resource to launch the false data injection attack. As a result, the attacker cannot easily utilize ε = Xδ as the attack vector because some of original data cannot be accessed by the attacker. For example, due to the limited resource, the attacker may only inject the malicious data into only 40% of original data. For the remaining 60% of original data that cannot be accessed, nothing can be injected into them. Thus, some elements of the attack vector ε would be 0.
Here, we assume that the attacker has access to L elements of the design vector y (e.g., L examples of wind power measurements). Let t l be the index of the l-th element (l = 1, 2, . . . , L) and let L = {t 1 , t 2 , . . . , t L } be the set of indices of all L elements that the attacker can get access to. According to Theorem 1, in order to pass the anomaly detection, the attacker should find an attack vector ε that satisfies two conditions as follows.

1.
ε t = 0 for t L (the attacker cannot inject errors into elements that cannot be accessed).

ε = Xδ (ε is a linear combination of column vectors of X).
To construct such an attack vector under the limited resource, we first prove that ε = Xδ has an equivalent but more straight-forward form [53].
The proof of Theorem 2 can be found in the Appendix A. According to Theorem 2, the attacker needs to construct an attack vector ε that satisfies Qε = 0 and ε t = 0 for t L. Let: where q t (t = 1, 2, . . . , m) is the t-th column vector of Q. Let: Energies 2020, 13, 3780 where ε t 1 , ε t 2 , . . . , ε t L are unknown variables. Substituting (8) and (9) into Qε = 0, it is easy to verify that: Then, we create a new m-by-L matrix Q = q t 1 , q t 2 , . . . , q t L ∈ R m×L and a new vector ε = (ε t 1 , ε t 2 , . . . , ε t L ) ∈ R L . Substituting Q and ε into (10), we obtain that: According to Meyer [54], the solution of the equation Qε = 0 is: where Q −1 is the Matrix 1-inverse of Q and d is the L-dimensional arbitrary non-zero vector. Using the non-zero solution ε, the attacker can construct the corresponding attack vector ε by first filling the element of ε into the corresponding position of ε and then filling 0 into the remaining position of ε.
Thus, under the limited resource, the attacker can still inject the false data into the original data without being discovered by the least-squares-based anomaly detection tool. Such false data injection attack mode can introduce a large error into wind power forecasting and seriously affect the prediction accuracy.

False Data Attack on Wind Power Data
In this subsection, the attack target is assumed to be wind power data. Such attack can be launched via compromising some meters in the target wind farm or directly hacking servers which store the original dataset, as shown in Section 2.2.
First, the anomaly detection technique based on least squares to identify the outliers in wind power data is presented. The outliers in wind power data are detected by least squares regression on exogenous inputs. To consider the nonlinear relationship between wind power output and wind speed, three regressors are constructed, i.e., linear, quadratic, and cubic regressors. On the other hand, four Fourier regressors are also constructed to model the daily seasonality observed in wind power data. Thus, the nonlinear regression model used to detect the outliers in wind power data can be written as: where x t is the wind speed at time t (t = 1, 2, . . . , m). d t is the time (24 h) of a day at time t (taking the value of 0, 1, 2, . . . 23). The design matrix X contains all seven regressors in its rows, as shown below: According to Theorem 1 or Theorem 2, we can construct the attack vector ε injected to wind power data. The attack vector ε would not be detected by least-squares-based anomaly detection techniques if it is a linear combination of column vectors of the design matrix X.

False Data Attack on Meteorological Data
Meteorological data is of crucial importance for short-term wind power forecasting and is generally obtained from external services. Thus, it is much easier for attackers to inject malicious data into meteorological information and then deteriorate the forecasting quality significantly.
First, the anomaly detection technique based on least squares to identify the outliers in meteorological data is presented. Different from wind power data where its outliers are detected by using exogenous inputs (such as wind speed and direction), meteorological data usually have no exogenous inputs. Hence, the autoregression model is selected to identify the outliers in meteorological data. The term autoregression (AR) indicates that it is a regression of a meteorological variable against itself. The meteorological data are approximated by a linear combination of past values of itself. The AR model of order p for meteorological data can be written as: Then, we can calculate the residual between the observed value x t and the estimated valuex t by AR models, i.e., r t = x t −x t . Given a threshold τ, the outlier exists in meteorological data if the absolute residual |r t | is larger than τ.
According to Theorem 1, we can construct the attack vector ε injected to meteorological data. The attack vector ε would not be detected by the anomaly detection technique based on the aforementioned AR model if it is a linear combination of column vectors of the design matrix X. Here, the design matrix X for AR models contains the lagged meteorological data in its rows, as shown below: where x t represents the meteorological data at time t (t = 1, 2, . . . , m − 1). Then, the attack vector is constructed according to Theorem 1 or Theorem 2.

Wind Power Forecasting Models
In this section, six representative wind power forecasting approaches (including three deterministic ones and three probabilistic ones) are briefly introduced. These approaches have been widely used in power system operation and planning [55,56]. The robustness of these six forecasting approaches would be investigated in our case studies of Section 7.

Deterministic Forecasting Models
Deterministic (or point) forecasting provides the conditional expectation of wind power output. Three wind power deterministic forecasting models are introduced as follows.

Multiple Nonlinear Regression (MNR)
MLR is one of the widely-used techniques for many forecasting problems. In short-term wind power forecasting, wind power output is treated as the responsible variable y, while wind forecasting from NWP and calendar variables are usually treated as input variables x [4,31]. The generic fitting formula of MLR models is y = θ T x. Given the training dataset, the parameter vector θ can be estimated by minimizing the sum of square errors, as shown in (2). The solution of θ is given in (5).

Artificial Neural Network (ANN)
ANN originates from algorithms that try to mimic the human brain. It has been very widely used in many forecasting problems including wind power forecasts since the 1980s. Its popularity once diminished in the late 1990s. But recently, with the rapid development of deep learning, ANN has regained quite a lot of attention. As a data-driven black-box approach, ANN can approximate the nonlinear relationship between all input variables and wind power output by learning from the training data. ANN-based short-term wind power forecasting was also commercialized and used by many forecasting vendors [24].
In this paper, we use a three-layer feed-forward neural network to provide short-term wind power forecasting [24]. Every input neuron is connected to every hidden neuron. There is only one output neuron representing wind power output. The sigmoid function is chosen as activation functions for hidden and output neurons. This ANN model is trained by the Levenberg-Marquardt backpropagation algorithm. Furthermore, this ANN model has two important hyperparameters, i.e., the number of hidden neurons and the weight decay. Both of them are optimized through k-fold cross validation.

Support Vector Machine (SVM)
SVM approaches first map the original data to a high-dimensional space through a nonlinear transform. Then, we can apply the traditional linear regression in the high-dimensional space, which is equivalent to the nonlinear regression in the low-dimensional space. Quite a lot of papers have reported how to use SVM in short-term wind power forecasting [26].
In this paper, we utilize a specific SVM model called -SVM for short-term wind power forecasting [26]. For the training dataset (x i , y i ); i = 1, 2, . . . , m , the -SVM model tries to obtain the parameter vector θ of the fitting hyperplane y = θ T x. SVM approaches have two important hyperparameters, i.e., γ and . Their optimal values are obtained through k-fold cross validation.

Probabilistic Forecasting Models
Different from deterministic forecasting, probabilistic forecasting provides more information to quantify the uncertainty of wind power output, which is very crucial to make decisions of power system operations under uncertainty [34]. The target of probabilistic forecasting is to provide the wind power output distribution constructed by multiple quantiles: whereq (α) t is the predicted α-quantile of wind power output. α is the quantile percentage. Three wind power probabilistic forecasting models are introduced as follows.

Quantile Regression (QR)
QR was first applied in short-term probabilistic forecasting of wind power generation in the early 2000s [29]. Then, several variants of QR approaches have been proposed in recent years [30,35], indicating high accuracy in wind power forecasting.
QR is an extension of MNR but its output variable is the quantile of wind power output. So, the quantile is expressed as a nonlinear combination of input variables. Hence, the fitting formula of QR models is written asq Given the training dataset, the parameter θ of QR models are estimated by minimizing the pinball loss function (i.e., the titled absolute deviation) [57]. Note that QR only provides the estimate of one quantile at a time. Hence, the QR method must be repeated several times to predict all quantiles of interest.

Quantile Regression Neural Network (QRNN)
The idea of quantile regression was first combined with the artificial neural network in [58]. The combined tool was named as QRNN. ANN allows to estimate any nonlinear model without specifying the nonlinear fitting formula. Hence, in comparison with QR, QRNN has the advantage of estimating potentially nonlinear quantile models. In this paper, a single hidden-layer feed-forward neural network was chosen as the QRNN structure [58], which is identical to the ANN structure in Section 4.1.2. QRNN's parameters are estimated by minimizing the quantile regression error function. QRNN's hyperparameters, i.e., the number of hidden neurons and the weight decay, are optimized by k-fold cross validation.

K-Nearest Neighbors (KNN) and Kernel Density Estimator (KDE)
KNN-KDE was proposed by us in [31] for short-term wind power probabilistic forecasting. We used the KNN-KDE model to participate in Global Energy Forecasting Competition 2014 (GEFCom2014). Finally, KNN-KDE ranked Top 5 on the wind forecasting track, verifying its effectiveness in short-term wind power probabilistic forecasting.
The main idea of KNN-KDE models is that similar situations lead to similar outcomes and then similar outcomes are used to predict the distribution. First, the KNN algorithm mines the historical dataset to find examples which have the most similar weather situation to the targeted forecasting hour. These examples are known as nearest neighbors. Second, as for wind power probabilistic forecasting, wind power output measurements of nearest neighbors are extracted, and then the KDE method is applied to construct the distribution of hourly production of wind power generation for the target hour. The predicted distribution is finally converted to quantiles of interest. Details of KNN-KDE can be found in [31].

Robustness Assessment Framework of Wind Power Forecasting Model
This section introduces the framework of how to assess the robustness of wind power forecasting models. This framework will be used in our case study. First, the accuracy evaluation of wind power deterministic and probabilistic forecasting is presented, respectively. Then, we propose the Monte Carlo simulation framework to emulate false data attacks and evaluate the robustness of any wind power forecasting models.

Accuracy Evaluation of Wind Power Forecasting
For wind power deterministic forecasting, root mean square error (RMSE) is chosen as the measure to evaluate the accuracy: where y t andŷ t are the observation and the prediction of wind power output, respectively. For wind power probabilistic forecasting, quantile score (QS) is selected as the measure to quantify the skill of probabilistic forecasting, as shown below: whereq (α p ) t is the predicted α p -quantile of wind power output. Similar to RMSE, QR is also negative oriented, meaning that the lower the better.

Robustness Assessment Framework
To study the influence of false data attacks on the accuracy of wind power forecasting, we establish the Monte Carlo simulation framework of false data attacks [59] and then use such framework to assess the robustness of wind power forecasting. This framework works as the following and the flow chart of this framework is given in Figure 3. Its output is the forecasting error under false data attacks.

Robustness Assessment Framework
To study the influence of false data attacks on the accuracy of wind power forecasting, we establish the Monte Carlo simulation framework of false data attacks [59] and then use such framework to assess the robustness of wind power forecasting. This framework works as the following and the flow chart of this framework is given in Figure 3. Its output is the forecasting error under false data attacks.  Step 1. Initialization: Step 1.a: Given all training samples (x t , y t ); t = 1, . . . , m , construct the design matrix X and the design vector y. Then, calculate Q = X X T X −1 X T − I = q 1 , q 2 , . . . , q m according to Theorem 2.
Step 1.b: Initialize the iteration counter, ν = 0. Set the tolerance σ. Suppose that the attacker injects malicious data into ρ% of the original data (0 < ρ ≤ 100). The number of the attacked data is L = ρ% * m.
Step 2.d: Inject the attack vector ε into the training dataset and then obtain the malicious data y ε = y + ε.
Step 2.e: Use the training dataset (X and y ε ) to train one of six wind power forecasting models (MNR, ANN, SVM, QR, QRNN, or KNN-KDE).
Step 2.f: Evaluate the forecasting model and tune the model hyperparameters on the validation dataset. Then, we can obtain the final forecasting model.

Step 3. Stopping Condition:
Step 3.a: Evaluate the forecast error (RMSE or QS) E ν on the test dataset.
Step 3.b: Collect all forecast errors up to the current iteration (1 ≤ i ≤ ν), calculate the variance coefficient β: Step 3.c: If β ≤ σ, then terminate with the final forecast error being the average error of all iterations, i.e., ν i=1 E i /ν. Otherwise, return to Step 2.

Data and Model Setup
In this section, the data used for numeric experiments is first introduced. Then, the setup of all wind power forecasting models is presented, including the selection of their input variables.

GEFCom2014 Data
In this paper, the robustness of wind power deterministic and probabilistic forecasting is studied using GEFCom2014 data. GEFCom2014 data includes ten wind farms. For each farm, it provides 24 months of normalized wind power measurements and wind speed/direction predictions at 10 m/100 m from NWP. The temporal resolution of GEFCom2014 data is 1 hour. The whole data is separated into three subsets, i.e., the training subset (January 2012-December 2012) to fit the forecasting model, the validation subset (January 2013-June 2013) to tune the model hyperparameters, and the test subset (July 2013-December 2013) to evaluate the forecast accuracy of the final model. The test subset has never been used to fit the final forecasting model.
Note that all forecasting models implemented in this paper, either the deterministic or probabilistic one, only use NWP and calendar information to construct their input variables. Other input variables are not considered in our study due to the shortage of very strong evidences showing their effectiveness in practice.

Setup of Deterministic Forecasting Models
Our target is to provide wind power forecasting for future 24 h. As for short-term wind power deterministic forecasting, the following formula is used to train the MNR model [4,31]: where w t is wind speed prediction at 100 m from NWP. d t is the time (24 h) of a day. It takes the value of 0, 1, 2, . . . 23.
As for (22), its first three items are the cubic polynomial of wind speed prediction, describing the sigmoid speed-to-power curve of the wind turbine. On the other hand, the last four items in (22) describe diurnal patterns observed in winds [4]. In our case studies, (22) is also utilized as the fitting formula of the least-squares-based anomaly detection technique, as shown in (13).
Input variables used in ANN and SVM models are identical. There are eight input variables and they are wind speed/direction predictions and (u, v) -wind predictions at 10 m and 100 m, respectively.

Setup of Probabilistic Forecasting Models
As for short-term wind power probabilistic forecasting, we provide 9 quantilesq In our study, the fitting formula of QR models is the same with that used in MNR models (i.e., (22)), which is shown as follows: Input variables of QRNN and KNN-KDE models are identical. Their effectiveness has been verified in the GEFCom2014. Here, we use four input variables, i.e.,

Numerical Results
In this section, we perform numerical experiments using the framework proposed in Section 5.2 and then compare the robustness of different forecasting models under various levels of false data attacks. The proposed robustness assessment framework is implemented in RStudio with R 3.5.3. All computation is run on a desk computer with an i7-8700 processor and 32GB RAM. In the Monte Carlo simulation, the tolerance σ to stop the iteration is set to 0.01. Note that RMSE and QR reported in this section have been averaged over all look-ahead time and all testing data. Both RMSE and QR are shown as the percentage of nominal wind power.

Results of Deterministic Forecasting
This subsection investigates the robustness of short-term wind power deterministic forecasting approaches (i.e., MNR, ANN, and SVM) under false data attacks. Table 1 gives RMSE results of three models without false data attacks. The smallest RMSE for each wind farm is filled in gray. From Table 1, it can be found that SVM is the most accurate approach, followed by ANN and finally MNR. To study various levels of false data attacks, we vary the percentage of malicious data injected to the original dataset (i.e., ρ%) from 10% to 100% with the step 10%. Pairwise comparisons of three approaches (MNR, ANN, and SVM) on each percentage and on each wind farm are visualized using scatter diagrams, as shown by the first three subfigures in Figure 4. Figure 4a-c compares any two of three forecasting approaches. Its x-axis and y-axis represent the RMSE of two approaches, respectively. One point in Figure 4a-c represents one wind farm under a specific attack percentage. Furthermore, we add the diagonal line (black dotted line), indicating that x-axis and y-axis approaches provide the same RMSE result. If the point is above the diagonal line, it means that the y-axis approach has a larger RMSE than the x-axis approach, and vice-versa.
From Figure 4a,b, it can be found that most points are below the diagonal line. It means that ANN/SVM has a lower RMSE than MNR at most farms under most levels of false data attacks. As for the comparison between SVM and ANN in Figure 4c, their performance is very close and SVM seems have a slightly lower RMSE than ANN. To better compare ANN and SVM, Figure 4d gives the average RMSE over all farms. From Figure 4d, it can be seen that SVM has a lower RMSE when the attack percentage is less than 50% or more than 80%. However, for the percentage between 50% and 80%, ANN provides the most accurate forecasting results.
scatter diagrams, as shown by the first three subfigures in Figure 4. Figure 4a-c compares any two of three forecasting approaches. Its x-axis and y-axis represent the RMSE of two approaches, respectively. One point in Figure 4a-c represents one wind farm under a specific attack percentage. Furthermore, we add the diagonal line (black dotted line), indicating that x-axis and y-axis approaches provide the same RMSE result. If the point is above the diagonal line, it means that the y-axis approach has a larger RMSE than the x-axis approach, and vice-versa.  From Figure 4a,b, it can be found that most points are below the diagonal line. It means that ANN/SVM has a lower RMSE than MNR at most farms under most levels of false data attacks. As for the comparison between SVM and ANN in Figure 4c, their performance is very close and SVM seems have a slightly lower RMSE than ANN. To better compare ANN and SVM, Figure 4d gives the average RMSE over all farms. From Figure 4d, it can be seen that SVM has a lower RMSE when the attack percentage is less than 50% or more than 80%. However, for the percentage between 50% and 80%, ANN provides the most accurate forecasting results.

Case II: False Data Attacks on Input Variable "WS100"
In the previous part, we study the impact of false data attacks on the output variable "WP". Besides, some input variables can also be attacked. Here, "WS100", the most relevant input variable In the previous part, we study the impact of false data attacks on the output variable "WP". Besides, some input variables can also be attacked. Here, "WS100", the most relevant input variable to wind power forecasting, is selected as the attack target. False data attacks on "WS100" are similar to false data attacks on "WP", which has been introduced in Section 3.4. RMSE values under the attack on "WS100" are compared with those on "WP" in Figure 5. to wind power forecasting, is selected as the attack target. False data attacks on "WS100" are similar to false data attacks on "WP", which has been introduced in Section 3.4. RMSE values under the attack on "WS100" are compared with those on "WP" in Figure 5. In Figure 5, we can see that attacking "WS100" has less influence on the accuracy of wind power forecasting than attacking "WP". Even for % = 100%, RMSE values under the attack on "WS100" only increase 2.55%, 2.58%, and 1.43% (in comparison with % = 10%) for MNR, ANN, and SVM, respectively. In fact, "WS100" comes from NWP which also has the forecast error. Injecting false data into "WS100" might either worsen or improve NWP quality, depending on whether the injected false In Figure 5, we can see that attacking "WS100" has less influence on the accuracy of wind power forecasting than attacking "WP". Even for ρ% = 100%, RMSE values under the attack on "WS100" only increase 2.55%, 2.58%, and 1.43% (in comparison with ρ% = 10%) for MNR, ANN, and SVM, respectively. In fact, "WS100" comes from NWP which also has the forecast error. Injecting false data into "WS100" might either worsen or improve NWP quality, depending on whether the injected false data offsets the forecast error. So, this makes great uncertainty of the false data attack on "WS100", having less impacts on the accuracy of wind power forecasting in comparison with attacking "WP". It means that more attention should be paid to protect data security of output variable "WP". The training sample number can also have great influence on the performance of forecasting approaches. If the training sample number is very small, it would be much easier for attackers to attack the whole dataset with only very limited resource. To investigate the impact of the sample number, the sample number is varied from 18 months to 6 months by decrements of four months. Table 2 gives the average RMSE of three models under various levels of false data attacks. We choose RMSE of 18 months as the benchmark and then calculate the rate-of-change (ROC) of RMSE for 14, 10, and 6 months. The ROC percentage is shown in the bracket. The negative ROC means less accurate forecasting results than the benchmark, and vice-versa. From Table 2, it can be observed that RMSE results are very close for two forecasting models trained by 18-month data and 14-month data. However, when the sample number drops to 6 months, we observe a very significant increase of RMSE values. Furthermore, such increase of RMSE values is more significant for large values of ρ%. It means that wind power forecasting models would be less robust and vulnerable when they use a small number of training samples. On the other hand, it also shows that increasing the sample number can improve the robustness of wind power forecasting under false data attacks.

Results of Probabilistic Forecasting
In this part, we investigate the robustness of short-term probabilistic forecasting models (QR, QRNN, and KNN-KDE) under false data attacks. Table 1 gives QS results of three models under no false data attacks. KNN-KDE is the most accurate approach, followed by QRNN and finally QR. To study the impact of false data attack on probabilistic forecasting, we change the percentage of malicious data injected to the original dataset from 25% to 100% with the step 25%. Table 3 gives the average QS over all farms under various levels of false data attacks. From Table 3, it can be observed that QS values of all three approaches increase with the increase of the attack percentage (ρ%), meaning less accurate results of probabilistic forecasting under false data attacks. Among all three approaches, KNN-KDE demonstrates the strongest robustness to any attack percentages as it has the lowest QS result. QRNN and QR rank second and third, respectively.  Figure 6 compares QS values of three models on 10 wind farms for ρ% = 75% or 100%. From Figure 6, it can be found that KNN-KDE always provide the lowest QS value on all farms. In contrast, KNN-KDE only beats QRNN on 5 wind farms under no false data attacks (ρ% = 0%, as shown in Table 1). On the other hand, the accuracy improvement of KNN-KDE over QRNN is more significant for ρ% = 100% compared with ρ% = 75%. It means that KDE-KNN can provide more accurate probabilistic forecasting under false data attacks with very strong intensity.

Results of Probabilistic Forecasting
In this part, we investigate the robustness of short-term probabilistic forecasting models (QR, QRNN, and KNN-KDE) under false data attacks. Table 1 gives QS results of three models under no false data attacks. KNN-KDE is the most accurate approach, followed by QRNN and finally QR. To study the impact of false data attack on probabilistic forecasting, we change the percentage of malicious data injected to the original dataset from 25% to 100% with the step 25%. Table 3 gives the average QS over all farms under various levels of false data attacks. From Table 3, it can be observed that QS values of all three approaches increase with the increase of the attack percentage ( %), meaning less accurate results of probabilistic forecasting under false data attacks. Among all three approaches, KNN-KDE demonstrates the strongest robustness to any attack percentages as it has the lowest QS result. QRNN and QR rank second and third, respectively.  Figure 6 compares QS values of three models on 10 wind farms for % = 75% or 100%. From Figure 6, it can be found that KNN-KDE always provide the lowest QS value on all farms. In contrast, KNN-KDE only beats QRNN on 5 wind farms under no false data attacks ( % = 0%, as shown in Table 1). On the other hand, the accuracy improvement of KNN-KDE over QRNN is more significant for % = 100% compared with % = 75%. It means that KDE-KNN can provide more accurate probabilistic forecasting under false data attacks with very strong intensity.  Table 4 shows QS results of three probabilistic forecasting models under false data attack on input variable "WS100". In Table 2, we select QS results under no attacks (i.e., % = 0%) as the benchmark and then calculate the ROC for % = 25%, 50%, 75%, and 100%, respectively. The ROC is shown in the bracket. By comparing Table 3 and Table 4, we can see that attacking "WS100" has less impacts on the accuracy of wind power probabilistic forecasting than attacking "WP". Especially  Table 4 shows QS results of three probabilistic forecasting models under false data attack on input variable "WS100". In Table 2, we select QS results under no attacks (i.e., ρ% = 0%) as the benchmark and then calculate the ROC for ρ% = 25%, 50%, 75%, and 100%, respectively. The ROC is shown in the bracket. By comparing Tables 3 and 4, we can see that attacking "WS100" has less impacts on the accuracy of wind power probabilistic forecasting than attacking "WP". Especially for KNN-KDE, attacking the whole data (i.e., ρ% = 100%) only leads to 0.19% of RMSE increase. In contrast, attacking "WP" leads to nearly 29% of RMSE increase. It indicates that the data safety of output variable "WP" is more important for both deterministic and probabilistic forecasting. Note that only results of attacking "WS100" are shown, because "WS100" is the input variable which has greatest influence on the forecasting accuracy under false data attacks.  Table 5 demonstrates QS results of QR, QRNN, and KNN-KDE models trained by 18-month ("18M") and 6-month ("6M") data. From Table 5, we observe very large increase of QS values for all three models as the sample number decreases from 18M to 6M. However, the accuracy of KNN-KDE deteriorates much more slowly than other two approaches. It indicates the robustness of KNN-KDE for small number of training samples.