You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Communication
  • Open Access

21 June 2021

Influence of Features on Accuracy of Anomaly Detection for an Energy Trading System

,
and
1
Instituto Superior de Engenharia do Porto, Instituto Politecnico do Porto, R. Dr. Antonio Bernardino de Almeida, 431, 4249-015 Porto, Portugal
2
College of Basic & General Education, Chosun University, 309 Pilmundae-ro, Dong-Gu, Gwangju 61452, Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
This article belongs to the Collection Intelligent Security Sensors in Cloud Computing

Abstract

The biggest problem with conventional anomaly signal detection using features was that it was difficult to use it in real time and it requires processing of network signals. Furthermore, analyzing network signals in real-time required vast amounts of processing for each signal, as each protocol contained various pieces of information. This paper suggests anomaly detection by analyzing the relationship among each feature to the anomaly detection model. The model analyzes the anomaly of network signals based on anomaly feature detection. The selected feature for anomaly detection does not require constant network signal updates and real-time processing of these signals. When the selected features are found in the received signal, the signal is registered as a potential anomaly signal and is then steadily monitored until it is determined as either an anomaly or normal signal. In terms of the results, it determined the anomaly with 99.7% (0.997) accuracy in f(4)(S0) and in case f(4)(REJ) received 11,233 signals with a normal or 171anomaly judgment accuracy of 98.7% (0.987).

1. Introduction

Many servers are potential victims of cyber-attack. An energy trade system, for instance, is prone to cyber attack. In [1], transaction of energy is carried out within a transparent network that consists of smart factories, solar systems, home solar systems, wearable devices, smart IoT devices, smart vehicles and personal offices in smart cities. Relevant parties trade energy on the energy network by sending each other messages such as reply or request. Although the transparency of the network ensures the security of the parties from direct hacking, the energy network can be monitored and network data collected by an attacker can use it to prey on other parties anonymously [2]. In order to reveal the perpetrator, the system has to detect abnormal signals out of the energy trading signals and alert security. This paper suggests anomaly detection by analyzing the relationship between each feature to the anomaly detection model. The model analyzes the anomaly of network signals based on abnormal feature detection. The selected feature for anomaly detection does not require constant network signal updates and real-time processing of these signals. When the selected features are found in the received signal, the signal is registered as a potential anomaly signal, and is then steadily monitored until it is determined as either an anomaly or normal signal. The function as a surveillance for anomaly features in network signals. It detects and tracks potential abnormal signals until the nature of the signal is determined. This study attempts to analyze the features of network signals and compare the results of feature analysis for anomaly detection. Finally, it can be utilized as a precautionary measure for anomaly detection, and is expected to secure the safety of energy trading systems using energy networks. Works related to the security issue are introduced in Section 2, and this followed by the definition of the proposed ADM in Section 3. Section 4 defines the features used by the model. This study focuses on the service3(f3) and flag(f4), and the varied performance levels of models trained by these features are given in Section 5. The paper is then concluded in Section 6.

3. Anomaly Detection Model (ADM)

3.1. The Collection of Network Signals

Figure 3 depicts energy generators such as the solar system and the home solar system sending a SHARE message. Depending on supply levels, at times the home solar system can sell surplus energy and vice versa [12]). To buy the energy, buyers send a (request message) to acquire the (available cost, amount available) from the ETM (Energy Trade Model). The buyer receives information and the trade is proceeded in the ETM [1] on receiving a reply message from the ETM.
Figure 3. ADM (Anomaly Detection Model).
The ADM (Anomaly Detection Model) consists of the next steps: Network Signal Collection, Feature Analysis, Detection and Update (Figure 3). It runs a network signal collection as the first step. In the feature analysis step, it processes by analyzing the relationship of features. Through the feature analysis step, it decides the anomaly signal and next it updates the anomaly signals with SIG a s or SIG n s . SIG a s is an Anomaly Signal(as) and SIG n s is a Normal Signal(ns). At the network signal collection stage depicted in Figure 4, it runs a collection algorithm ADM to encode xi from the user’s device to the data collector and it collects the Yi ← A(Xi) as output. The network signals including the normal signal and anomaly signal are collected in an array type, and they are stored as a DB (Figure 3). It is collected from each user which is proceeded by ADM [13]. M is a mean, E is an exponent, e is an Euler number and A is an array.
Figure 4. Collection of the Network Signal.
As the first step, it runs a collection algorithm ADM to encode x i from the user’s device to the data collector and the collected output( Y i ) is next Y i = A ( X i ) as the output. It needs to guarantee a type of plausible deniability, no matter what the output is collected from each user, and it would be approximately equal as it is likely to have come from more a specific value x as any other x . The ADM follows next,
Y i = A D M ( X i ) = 1 , w i t h p r o b a b i l i t y = 1 e E + 1 + x i M · e E 1 e E + 1 0 , o t h e r w i s e .

3.2. Analysis of Network Signals

Network signals are raw signals before a classification. These signals can be used as input in the model periodically or non-periodically (Figure 4). In the first step, all network signals are received disorderly. To apply ML or/and DL, it has to perform the pre-processing of the raw data because all data are received randomly or irregularly. It means that there are no rules in the flowing network signals. As we can see the data flow in Figure 1, there is a normal signal and an anomaly signal in the physical system in a time series, s(t) = s(t 0 + n τ s ) = s(n), which is sampled at intervals of τ s and initiated at t 0 [14]. In addition, they show up through the initial pre-processing. Furthermore, before the decision of analysis, we have no idea which the normal is or which the anomaly signal is.To analyze, the algorithm needs the data after pre-processing, but the flow of the data looks random as in a chaotic system [15]. We also surveyed the existing research, but they are not suitable for real-time. To detect the anomaly signal in the network, it should be real-time. The dataset namedKDDCup was produced after computation. It means the dataset is not for real-time. Based on real-time, we have to select the base feature. The base feature has some qualifications which should not emerge from the computation. That feature should be captured from raw data and is simple in the first step.

4. Analysis

4.1. Feature Analysis

To simulate our suggestion, we used WEKA (ver 3.8.5) with the KDDTrain dataset [16]. To do this, we set ‘Discretize’ in the Preprocess and set ‘J48’ in the classifer. Moreover, we set the dataset to 80% for training and 20% for testing. The KDDTrain dataset is a dataset composed of 17 features from KDDCup, introduced in [16]. In the study, the selected 15 features were defined according to their respective creation and calculation methods. For example, count(f23) is defined by calculating the sum of connections to the same destination, or by analyzing the captured protocol without the calculation of items, such as service(f3) and flag(f4) (Table 1). As defined in Table 1, both service(f3) and flag(f4) have features defined by mathematical calculations. Table 2 shows the results of accuracy according to feature. The accuracy score of analyzing the class of service(f3) was 72.564%, and therefore it was considered meaningless to analyze because all network services are based on service(f3) including attack signals. On the other hand, the experiment’s results with flag(f4) all returned 99.120% in accuracy. This is because service(f3) is composed of applications, instructions, or protocols in a network, resulting in more than 50 classes. Since both normal signals and attack signals are normally used, analysis of service(f3) was concluded to be meaningless. Instead the study focuses on analyzing the relationship between classes of flag(f4), the only feature that is not a feature calculated by complex calculations. Each class defined in flag(f4) is defined in Table 3. In addition, the descriptions of all 42 features are also provided.
Table 1. Feature Definition.
Table 2. Accuracy to the number of features.
Table 3. Flag Code.

4.2. Correlation of Each Feature

The main dataset named KDDCup contains 42 features and the KDDTrain, which was from KDDCup abstracted by 15 features in Table 1. The table shows each categorized feature according to calculation including 2 features and non-calculation including 13 features. In the two features area which does not need the calculation, we can obtain the feature value by capturing the network signals such as a flowing protocol and, at that same time, the two features are working independently. On the other hand, the 14 features which need a calculation have to compute by sharing each value it amongst the features. We designed Figure 5 to show the correlation of each feature. It shows which values are going to be transferred to which features. Basically, f25, f27, f28, f38, f39, f40, and f41 need to use a value in f4, and f23 shares the ‘Sum of connections to the same destination IP address’ with f29. f33 sends ‘Sum of connections to the same destination port number’ to f39 and to f41, f32 sends the ‘Sum of connections to the same destination IP address’ to f38, f40, f35 and to f34. Table 1 contains all features and all classes to each feature. f3 defines all network services such as HTTP, FTP, SMTP, telnet, and other services. Because all users including an attacker also use the normal network services, the analysis of f3 will be useless. Finally, we can say that to analyze f4 and (f25, f27, f28)f23, f4 and (f39, f41)f33, f4 and (f38, f40)f32 should be the priority when we try to detect anomaly signals in a network.
Figure 5. Correlation of each feature.

5. Discussion

Network anomaly signal discovery should be real-time. Therefore, real-time data should be continuously applied to anomaly signal analysis results (a dataset that continues to be updated) to discover anomaly signals. For real time analysis, it must utilize features that can be immediately found in network signals without prior processing. In this study, we first determine service(f3) or flag(f4) as potential features. As a result, flag(f4) (99.120%) was chosen over service(f3) (72.564%) as the basis feature for anomaly signal discovery (Table 2). Refer to Table 2 for further information on the experimental results, Table 4 for the class types defined by flag(f4) and the table for compilation of the results of the experiments.
Table 4. Detailed accuracy by class with a flag.
For real-time anomaly signal detection, the system must simultaneously upload continuous signal analysis results and determine the basis feature for the received signal. As mentioned above, flag(4) is chosen from the KDDTrain dataset used in this paper. The Figure 5 shows the relationship between the 15 features used, and Table 5 summarizes the number of classes in flag(4). The results according to Table 5 are shown in Figure 6. Red stands for “anomaly” and blue stands for normal. The Figure 5 contains 11 classes defined by the Flag: OTH, REJ, RSTO, RSTOS0, RSTR, S0, S1, S2, S3, SF, and SH. First, S0 received 34,851 signals, and they were all determined to be an anomaly (red color). Based on Table 4, S0 determined an anomaly with 99.7% (0.997) accuracy. REJ received 11,233 signals with a normal or anomaly judgment accuracy of 98.7% (0.987).
Table 5. flag(f4) count.
Figure 6. Analysis between service and flag.
The Figure 7 shows the analysis between features, and based on the analysis conducted, the ADM proposed in this work analyzes the received network signals, and concludes that the user responsible for the signal requires closer observation if the class of the flag is S0 or REJ. Furthermore, when reliability is questioned in the analysis by S0 only, a mixture of REJ, RSTO and RSTR can lead to higher anomaly detection accuracy.
Figure 7. Analysis between features.

6. Conclusions

So far, we have conducted an accuracy analysis based on the feature. The problem with the existing methods has been that real-time processing of the anomaly signal discovery is challenging. To address this problem, we propose an update of the anomaly signal, focused around the features, and a method to detect the anomaly signal based on the selected features that can be obtained from raw data. In this study, the features that can be selected from raw data were service(f3) and flag(f4). The flag(f4) was selected over service(f3) for its relatively higher accuracy score. Nevertheless, the selected feature can be alternated depending on the situation. Since the characteristics of network signals are received in various places, classification based on ML, like conventional methods, is not appropriate for real-time anomaly signal discovery. Therefore, we propose applying the features that can be acquired from raw data to the ADM, so that the model is utilized to preoccupy and monitor signals when features that are deeply associated with abnormal signals are found. In terms of the results, it determined the anomaly with 99.7% (0.997) accuracy in f(4)(S0) and in case f(4)(REJ) received 11,233 signals with a normal or 171anomaly judgment accuracy of 98.7% (0.987). Future work, it is required for optimal selection of features for detecting more diverse abnormal signals.

Author Contributions

H.K.: conceptualization of this study, methodology, K.R.: cnalysis of results with mathematics, I.P.: project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This work has received funding from FEDER Funds through COMPETE program and 319 from National Funds through FCT under the project SPET—PTDC/EEI-EEE/029165/2017.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Acknowledgments

This work has received funding from FEDER Funds through COMPETE program and 319 from National Funds through FCT under the project SPET—PTDC/EEI-EEE/029165/2017. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2019R1I1A3A01063132).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADMAnomaly Detection Model
CNNConvolution Neural Network
MLMachine Learning
DLDeep Learning
I/FInterface
AEWMAAdaptive Exponentially Weighted Moving Average
EWMAExponentially Weighted Moving Average
ASAnomaly Signal
NSNormal Signal

References

  1. Ko, H.; Praca, I. Design of a Secure Energy Trading Model Based on a Blockchain. Sustainability 2021, 13, 1634. [Google Scholar] [CrossRef]
  2. Sun, C.C.; Cardenas, D.J.S.; Hahn, A.; Liu, C.C. Intrusion Detection for Cybersecurity of Smart Meters. IEEE Trans. Smart Grid 2020, 12, 612–622. [Google Scholar] [CrossRef]
  3. Samie, F.; Bauer, L.; Henkel, J. From cloud down to things: An overview of machine learning in internet of things. IEEE Internet Things J. 2019, 6, 4921–4934. [Google Scholar] [CrossRef]
  4. Restuccia, F.; D’Oro, S.; Melodia, T. Securing the internet of things in the age of machine learning and software-defined networking. IEEE Internet Things J. 2018, 5, 4829–4842. [Google Scholar] [CrossRef] [Green Version]
  5. Handa, A.; Sharma, A.; Shukla, S.K. Machine learning in cybersecurity: A review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1306. [Google Scholar] [CrossRef]
  6. Sarao, P. Machine learning and deep learning techniques on wireless networks. Int. J. Eng. Res. Technol. 2019, 12, 311–320. [Google Scholar]
  7. Zarpelão, B.B.; Miani, R.S.; Kawakani, C.T.; de Alvarenga, S.C. A survey of intrusion detection in Internet of Things. J. Netw. Comput. Appl. 2017, 84, 25–37. [Google Scholar] [CrossRef]
  8. Alabadi, M.; Celik, Y. Anomaly Detection for Cyber-Security Based on Convolution Neural Network: A survey. In Proceedings of the 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 26–28 June 2020; pp. 1–14. [Google Scholar]
  9. Liu, Q.; Li, P.; Zhao, W.; Cai, W.; Yu, S.; Leung, V.C. A survey on security threats and defensive techniques of machine learning: A data driven view. IEEE Access 2018, 6, 12103–12117. [Google Scholar] [CrossRef]
  10. Xin, Y.; Kong, L.; Liu, Z.; Chen, Y.; Li, Y.; Zhu, H.; Gao, M.; Hou, H.; Wang, C. Machine learning and deep learning methods for cybersecurity. IEEE Access 2018, 6, 35365–35381. [Google Scholar] [CrossRef]
  11. Tang, D.; Chen, K.; Chen, X.; Liu, H.; Li, X. Adaptive EWMA Method based on abnormal network traffic for LDoS attacks. Math. Probl. Eng. 2014, 2014, 496376. [Google Scholar] [CrossRef]
  12. Chen, S.; Chen, B. Urban energy consumption: Different insights from energy flow analysis, input—Output analysis and ecological network analysis. Appl. Energy 2015, 138, 99–107. [Google Scholar] [CrossRef]
  13. Kotenko, I.; Saenko, I.; Lauta, O.; Kribel, A. An Approach to Detecting Cyber Attacks against Smart Power Grids Based on the Analysis of Network Traffic Self-Similarity. Energies 2020, 13, 5031. [Google Scholar] [CrossRef]
  14. Abarbanel, H.D.; Frison, T.W.; Tsimring, L.S. Obtaining order in a world of chaos [signal processing]. IEEE Signal Process. Mag. 1998, 15, 49–65. [Google Scholar] [CrossRef]
  15. Pedraza, A.; Deniz, O.; Bueno, G. Approaching Adversarial Example Classification with Chaos Theory. Entropy 2020, 22, 1201. [Google Scholar] [CrossRef]
  16. Iglesias, F.; Zseby, T. Analysis of network traffic features for anomaly detection. Mach. Learn. 2015, 101, 59–84. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.