1. Introduction
With the rapid development of smart grids, information and communication technologies are widely applied to smart grids, which makes it more complex in structure and inevitably introduces more security issues. The cyberattack on the Ukraine power grid is a typical smart grid attack event that utilized BlackEnergy version 3 to make seven 110 and 2335 kV substations disconnected for three hours [
1]. In 2010, at least 14 industrial sites in Iran were infiltrated by the Stuxnet worm [
2], Stuxnet falsified the programmable logic controllers’ (PLCs) programming, resulting in the damaging of centrifuges. Due to the limited resources of those terminal devices, the traditional security detection method cannot be deployed, making it difficult to detect if the device has been compromised. As a critical data collection, communication, and control unit in the smart grid, as shown in
Figure 1, once the distribution terminal unit (DTU) is attacked and does not detect in time, it will cause undesirable and often severe economic losses and may even cause major casualties.
The existing protection measures of power industrial control systems include program-based, protocol-based, device-based, and rule-based [
3]. For the program-based method, the firmware integrity check tool and malware identification tool have been used for abnormal detection in the corporate environment [
4]. However, these technologies are difficult to apply to industrial scenarios. First, these tools are difficult to deploy on legacy devices. Second, the deployment of these tools will affect the original system, so that they may occupy the constrained resources. Moreover, if an attacker has control over the device, these tools are easy to fail.
Prior work [
5] has shown that the malicious code execution of PLC can be identified through the controller’s electromagnetic (EM) emanation signals collected from a proprietary chip. EDDIE [
6] also proposed an EM-based detection of deviations on IoT devices, EDDIE relies on spikes in the EM spectrum that are produced as a result of periodic activity in the monitored execution. However, those EM-based anomaly detection methods are susceptible to electromagnetic interference (EMI) and need to install the electromagnetic probe near the key chip inside the equipment, which may affect the heat dissipation and bring security issues. The most related works are the power-based anomaly detection methods [
7,
8,
9,
10,
11], which utilize the power consumption related to the running program to detect anomalies of software-defined radios (SDRs), PLC, etc. Narasimhan et al. proposed a new Trojan detection method based on non-invasive, multi-parameter side-channel analysis, which filters noise through dynamic current and proposed a vector generation method and several design/test technologies to improve detection sensitivity [
12]. Huang et al. proposed that the decomposition of current waveform into active and nonactive current can improve the identification of equipment in the non-invasive appliance load monitoring, but this method is aimed at coarse-grained device power consumption monitoring [
13]. Kim et al. put forward a power-aware malware detection framework for mobile devices. By collecting power samples and building a power consumption history, their method could realize a 99% true positive rate in classifying mobile malware that runs out of battery power [
14]. However, their approach applies only to a particular malicious application. Clark et al. proposed an add-on monitoring system, WattsUpDoc, which could detect malware on medical devices through power side-channel [
15]. However, the detection accuracy of previously known malware is only 94%, and that of unknown malware is only 85%. Another significant difference is that our method can monitor the running state of devices containing different CPUs; that is, the change in the program running on each type of CPU can be monitored. This suggests the feasibility of monitoring multi-core CPU or multi-type CPU devices.
There are also some related works to infer the privacy of users using side-channel information and machine learning algorithms. Yan et al. demonstrate that power consumption traces can be used as a practical side channel to gain various types of confidential information of mobile apps running on smartphones [
16]. By monitoring the public USB charging, Yang et al. realized that the conjecture of the mobile web page can be viewed with an accuracy of 90% [
17]. Conti et al. could distinguish the user’s notebook PC by capturing the smart meter data of the charging station, with an accuracy of 80% [
18]. Fan et al. attacked the AES module through correlation power analysis [
19]. Although the methods are similar to our work, the above works aim to use power consumption to get the user’s private information.
In recent years, machine learning has made remarkable achievements in computer vision, speech recognition, natural language processing, medical data analysis, and so on, showing the powerful ability of machine learning in solving classification, prediction, and auxiliary decision making. Machine learning technology provides a new way to solve the problem of network security which is difficult to model by traditional methods.
In the field of machine learning, it is divided into supervised learning and unsupervised learning according to whether there are labels in the data set. In supervised learning, each set of data has a label, for example, each piece of data in spam is divided into spam or normal mail. Common supervised learning algorithms include logical regression, artificial neural network, support vector machine, decision tree, random forest, and so on. The data does not contain label information in unsupervised learning, but the intrinsic association of data can be inferred by unsupervised learning algorithms, such as the K-means algorithm, KNN algorithm, hierarchical clustering algorithm, graph clustering algorithm and so on. In our work, we use supervised learning to solve the anomaly detection for terminals of the smart grid. In addition, deep learning, transfer learning, deep enhancement learning, and generative adversarial networks have provided new options for solving intrusion detection. Depth learning is used to solve security problems such as anomaly protocol detection [
20], malware detection [
21], network intrusion detection [
22], BGP anomaly routing detection [
23] and differential privacy protection [
24]. Transfer learning, which is good at scene or domain transfer, is also used to solve network security problems, such as using transfer learning to correct side-channel signals in hardware Trojan detection [
14].
In this paper, we propose a power-based non-intrusive anomaly detection method to detect the executing program change of DTU using a machine learning method without introducing security issues and affecting the real-time performance of the system. The basic idea is to analyze the power consumption by means of machine learning (ML), and infer whether the applications running on the DTU has changed. As shown in
Figure 2, the underlying physical principle is that the DTU always executes fixed applications that do not always change in the whole life cycle, and meanwhile, it’s hardware characteristics enable a strong corresponding relationship between its power consumption and it’s executing programs. We also found that when the program has been falsified, even the behavior of the DTU looks normal, the power consumption will also change accordingly. Therefore, by analyzing the real-time operating power, it is possible to detect the running state of the DTU.
Note that as the DTUs run on VxWorks, which is a real-time operating system (RTOS), in order to increase the character of real-time, DTU adopts two different CPU units (including a DSP and ARM processor), thus, all the jobs (such as communication, display, FFT, etc.) can be processed in real-time with the distributing calculation of the two CPUs that are in charge of different applications. For example, the ARM processor is responsible for communication and switch control applications, DSP runs the applications of data collection and processing. Due to the fact that the ARM and DSP are integrated on the same chip, it hard to extract the corresponding power traces of each CPU to detect program change respectively. In this paper, we obtain the power of two CPUs and its peripheral components by measure the voltage drop across a resistor that connects between the motherboard and DC power supply, as the voltage drop is proportional to the power consumption of the target circuit [
25].
To validate the idea and implement the anomaly detection approach, we first set up a testbed based on DTU in the laboratory. Secondly, we design a normal program and four attack programs on the DTU, which aims to change the applications separately without being noticed by the operator. Third, we design a power sensor to collect the power traces and chose representative features to capture the difference of power traces. We then use those features to build ML models for anomaly detection. We compare three different types of classifiers: One-class, binary-class, multiclass classification algorithm, and select the best classifiers among seven algorithms. Finally, we implement the scheme on ubuntu.
We show that the anomaly detection scheme achieves a detection accuracy above 99.98% with lower system overhead. In addition, our scheme has the following advantages: the scheme does not require modifying the hardware or software, which does not bring potential vulnerability, and the deployment will not interrupt the normal operation of the power grid. In summary, our contributions are summarized as follows:
We propose a lightweight and non-intrusive anomaly detection approach on DTU by analyzing the power consumption.
We explore the strong corresponding relationship between power and applications and design the detection system that can detect the application change by analyzing the power.
We implement the detection system on Ubuntu and evaluate its performance on the five programs with different ML algorithms, a video demo of the real-time anomaly detection system is available at
https://youtu.be/_2GjTkbYWk8. The result shows that the detection system can achieve a detection accuracy of up to 99.98%.
2. Materials and Threat Model
2.1. Materials
In this work, the resistors were purchased from Shanghai Yixin Trading Co., Ltd. (Shanghai, China). The DTU and power supply were purchased from CYG Sunri Co., Ltd. (Shenzhen, China). The DAQ (DAQ U2541a) were purchased from Keysight Technologies (Santa Rosa, CA, USA).
2.2. Threat Model
In this section, we briefly introduce the threat model of the DTU system. The adversary’s goal is to hack the power grid system including DTUs and execute unauthenticated applications that cannot be detected by the traditional security method. We assume that the adversary has the ability to access the DTU by utilizing the vulnerability of the smart grid system or the adversary is the worker who can directly operate and download programs to the DTU, and he actually has been bought to damage the smart grid system.
In this paper, we design four types of attacks against the DTU, which mainly falsify the code running in the DSP and ARM processor. These attacks are characterized by high concealment as the attacked smart device seems to work normally, but the data collected from the smart grid is actually forged. Next, we will introduce the four attacks in detail.
- (1)
The Switch on/off Attack (Attack1).
The switch on/off attack mainly changes the code running in the ARM processor. One of the most important applications for DTU is to measure various values on the power lines through current or voltage sensors. When some measured values are greater than the preset threshold, the DTU will close the overloaded line through the controlling switch. The switch on/off attack makes the power switch controlled by DTU continuously switch on and switch off at a very fast rate, thereby affecting the normal operation and the stability of the power grid and reducing the service life of the power switch. When the frequency of the switch on/off is high enough, the friction loss of the switch will become very large, and people will not be able to detect the abnormal situation of the power grid.
- (2)
The Data Collection Attack (Attack2).
The data collection Attack mainly changes the code running in the DSP processor. In general, the sampling rate of the current/voltage collection is about 1000 SPS (Sample Per Second). The attack significantly decreases the sampling rate to 50 SPS, but the frequency of data uploaded to the monitoring center does not change. The lacking samples will be forged according to the collected sample. This type of attack makes DTU unable to accurately and timely sense the status of the smart grid.
- (3)
The Channel Attack (Attack3).
The channel attack mainly changes the code running in the DSP processor. As we know, the DTU has the ability to monitor the status of several power lines, such as the values of the current and voltage. The channel attack aims to reduce the number of monitoring power lines to one, the current and voltage values of other power lines are forged according to the values of the monitoring line. Therefore, the forged value is difficult to be seen through, and when the unmonitored line is overloaded, the DTU cannot take protection operations based on the actual value, which will bring catastrophic consequences for the smart grid.
- (4)
The Overload Attack (Attack4).
The overload attack mainly changes the code running in the ARM processor. In addition to running legitimate programs, the attackers will download and run an additional program on DTU that is meaningless but takes up a lot of computing and storage resources, so that the overload attack could deprive the resources of legitimate applications, thus, the legitimate service cannot get a timely response. For example, when the load of the power grid is short-circuited, the DTU cannot measure the current and disconnected the load in time when under the overload attack.
Essentially, these four types of attacks can be regarded as falsification of executed programs inside the device. Note that the operating system version of the Data Collection Attack is V3.30, and the others are V3.10. No matter what operating system version, our proposed attack detection method can defend against those kinds of attacks.
3. Methods
In this section, we first introduce the background of power consumption and present the feasibility from the perspective of physical principle and experimental evidence, and then we introduce the design of the monitoring system based on the power consumption side-channel.
3.1. Power Side-Channel
Side-channel information refers to the information of sound, power consumption, electromagnetic radiation, and even the running time generated by electronic equipment when the program is running. These channels are different from the channels used in normal communication, so they are called side-channel information. Side-channel signals reflect the execution of a set of particular instructions inside the processor; thus, they can be leveraged to monitor program execution [
26].
The power and EM side-channel signals are the most commonly used signals for anomaly detection. In theory, compared with EM signals, the accuracy of using power consumption for anomaly detection is higher because the voltage fluctuation in power consumption caused by code execution will radiate out in the form of EM wave through the components of the circuit, such as the processor or random-access memory, which can act as antennas [
26]. And in the process of radiation and reception, the information related to program changes will further be lost. Thus, we select the power consumption signal as the candidate signal.
Another advantage of anomaly detection through power side-channels is that the specific source code running in DTU is not required, or even that we do not need to know which functions the program calls. Traditional malware detection requires code analysis of the program, which is not feasible in our case because the device manufacturer is unwilling to share its code with the users, a situation that exists in many cases.
3.2. Feasibility Analysis
In this section, we present how power consumption is related to the executed program from the perspective of physical principles and experimental evidence.
Firstly, from the perspective of physical principles, the root cause of chip power consumption is the flip of the gate circuit. Different programs have diverse instructions and each instruction calls varied modules in the chip, so the power consumption also should be different. So it is feasible to distinguish different programs by power consumption information from the view of physical principles.
Secondly, from the perspective of experimental evidence, we firstly perform simple experiments on the DTU to verify the feasibility of power side-channel analysis. We launched four attacks and collect the corresponding power signal, and extract 34 features with the PyaudioAnalysis [
27].
Figure 3 shows the original power signal captured by our data acquisition card, and
Figure 4 shows the features of different power signals. We can see from the original power signal that although the overall difference between normal and abnormal programs is small, there are still some minor differences. From the features of different power signals, we can see that the features are distinguishable and the gap between different features is obvious. Moreover, we find that the features from the data collection attack have significant differences with other features. We guess that it is because the data collection attack was run on a different operating system version.
Thus, based on the above description and discussion, we can utilize the machine learning method to detect different attacks.
3.3. Overview of System
The design of the detection system is mainly composed of two parts, i.e., the hardware design capturing the power consumption of the DTU, and the software design which analyzes the collected power consumption to infer the running status of DTU, as shown in
Figure 5. In the following, we will reveal the details of the two parts.
3.4. Hardware Design
The key to detecting attacks is to obtain the side-channel information related to the executed program and then extract the relevant features. There are several side-channel collection methods commonly used to obtain the running status of the device, i.e., by capturing the electromagnetic wave and temperature radiation from the devices or collecting the power consumption from the power supply line. Radiation information is vulnerable to environmental interference and will lose some information in the process of radiation and receive, thus, we select the power consumption side-channel information that has strong anti-disturbance ability against the external ambient noise (Algorithm 1).
Algorithm 1 Power-Based Anomaly Detection. |
|
There are two common ways to capture power information, one is to capture the voltage drop of a sampling resistor, and the other utilizes electromagnetic induction from the power supply line. The induction scheme has an advantage over the sampling resistor scheme, that is, it can be completely isolated from the original system, and does not modify the hardware of the original system. However, compared with the resistor scheme, the induction scheme is also susceptible to electromagnetic interference existing everywhere in the environment and also losing valid information. Therefore, we finally chose the sampling resistor as the power sensor.
Specifically, the scheme first collects the power consumption information of DTU by capturing the voltage drop across a resistor, which connected between the power supply and DTU, then transmits the raw power consumption signal to the analysis part, as shown in the left part of
Figure 5. Instead of developing a data acquisition instrument [
28,
29,
30,
31,
32], we use the Agilent U2541A [
33] in this paper to capture the voltage drop across the resistor and the Agilent U2541A supports a maximum sampling rate of 250 kHz and 16-bit resolution.
According to our test results, the rated power of the DTU is less than 5 W, and the rated DC voltage is 24 V, thus, the current of the supply power line is below 0.2 A. When the value of the sampling resistor is less than 0.5 , the signal-to-noise ratio (SNR) of collected power traces is low, which cannot accurately reflect the running program in the device. To improve the SNR, we increase the value of the resistor to 1 without affecting the normal operation of DTU. It is also worth mentioning that the power acquisition equipment, power supply module, and sampling resistor need to share the same ground to avoid interference, and at the same time, shielded wires also should be used. Our experiments show that the resistor scheme can retain the information of the original power signal well.
3.5. Software Design
When the power consumption signal is transmitted to the computer, the signal needs to be analyzed by binary or multiclass classification to infer whether the executed program has been changed or which program is running. The key challenge of detection is to find the best features combination and choose a lightweight algorithm that can reduce the system resources in the condition of high detection accuracy. The designed algorithm is illustrated in Algorithm 1.
3.5.1. Preprocessing
We first add the low pass filter to remove the DC signal and signals with frequencies below 70 Hz as the problem of power frequency interference is serious. After that, we divide the signal into fragments of the same time length (), and there is an overlap between adjacent power fragments (), and last, extract primary features from these power samples.
3.5.2. Feature Extraction and Selection
Feature extraction. We choose as many features as possible to keep useful information from the power traces. In this paper, we use the PyaudioAnalysis [
27] to calculating 34 types of features including Zero Crossing Rate, Energy, etc. We then constructed an original 34-dimension vector
= {ZCR, Energy, …, Chroma Deviation} for each power consumption trace.
Feature selection. To remove the redundant features that are not that significant and get a better detection performance with less training and testing time, we select the most relevant features using the RandomForest algorithm, which is a supervised machine learning algorithm. Instead of searching for the most important features, the RandomForest searches for the best features in a random subset of features, which leads to a better candidate feature set. So we find a set of basis vectors that can replace the original feature set.
3.5.3. Training and Testing
Training. The binary classification algorithm divides the features of power traces into two categories as normal samples (normal program) and abnormal samples (four types of attacks), thus, the binary classification can only detect whether the executed program has been tampered and cannot detect which attack. In order to identify what kind of tampering the executed program has suffered, we need to use a multiclass classification algorithm in which the testing power trace is divided into exactly one of five classes. In this paper, we adopt the two algorithms to satisfy different requirements.
Testing. In the process of the detection process, we collect the power consumption traces of DTU in real-time and calculate the power features, which will feed into the trained models to detect whether the executed program has been falsified or what kind of attack the device suffered.
5. Discussion
The results presented in this work demonstrate the feasibility of using power consumption to detect the anomaly application running in the terminal devices in a non-intrusive and lightweight way. The results show the close relationship between the running programs and power consumption. That is, the power consumption changed with the changing program of DTU, therefore, the power consumption can be analyzed through machine learning to realize the detection of attacks without introducing security issues and affecting the real-time performance of the system compared to the traditional methods.
Zeus [
5] presents a contactless, passive, and non-intrusive control flow integrity monitoring solution for PLCs. Zeus identifies malicious code execution through side-channel analyses of the controller’s electromagnetic emanation signals. The programs used for classification fall in the classes of vector arithmetic, numerical methods, control algorithms, cryptography, signal processing, and communications. As we can see, these programs have totally different functions, which can be identified by the operator when these programs are running, and because of the great difference of program function, the power consumption will also vary greatly. Compared to our method, Zeus did not show the detection performance of the attacks in which the function is the same as the normal programs. Haider et al. propose a malware detection system for critical embedded and cyber-physical systems (CPS) by exploiting electromagnetic (EM) side-channel signals from the device to detect malicious activity [
35]. They separately evaluate the system with different malware behavior on different applications using an Altera Nios-II soft-processor and an A13-OLinuXino—a Cortex A8 ARM processor. Compared to our work, they did not show the detection performance on multiprocessors. RFDIDS [
36] proposed an air-gapped distributed intrusion detection system that monitors the substation activities by radio frequency (RF) measurements to verify the correctness of the SCADA network traffic. Compared to our approach, they monitor network traffic. Liu et al. proposed a low-cost code execution tracking method, which can monitor the code changes of embedded devices through the power side channel. He builds a hidden Markov model to model the power data of instruction execution and recovered the most likely instruction sequence with the modified Viterbi algorithm [
14]. However, their method was aimed at the embedded MCU chip with simple 8051 architecture, while the smart grid terminal equipment is much more complex both in hardware and software.
Although there are important discoveries revealed by these studies, there are also limitations. First, the proposed method can detect whether the running program is a normal or malicious one, but we do not know if it can detect the injection process of malicious code. Second, the detection performance of the anomaly detection model against unknown attacks also needs to be explored. Third, due to the fluctuation of the grid voltage, the output voltage of the DC power supply will also change slightly; thus, the power consumption of the terminal device also changes accordingly, and the future work has to explore the performance of the long-time detection.
In summary, the studies in this paper thus offer a new strategy to detect the anomaly behavior of the terminal devices in the smart grid. The results also illustrate the feasibility of monitoring multi-core CPU or multi-type CPU devices using power side-channel information. In future work, we also need to explore the performance of our proposed algorithm on attack code injection, unknown attacks, and long-term attack detection.