Detecting Encrypted and Unencrypted Network Data Using Entropy Analysis and Confidence Intervals
Abstract
:1. Introduction
 An interval for encrypted data;
 An interval for DSCT format.
2. Methods for Detecting Encrypted and Unencrypted Data from the Network
2.1. Statistical Methods to Detect Encrypted and Unencrypted Data from the Network
2.1.1. Using Entropy to Classify Data from the Network
2.1.2. The Use of Other Statistical Methods to Detect Encrypted Data and Clear Data from the Network
2.2. Methods That Use Machine Learning to Detect Encrypted Data and Those in the Clear Form
2.3. InDepth Statistical Parameters, Used in Neural Networks
2.4. The Nearest Neighbor Method
2.5. Using the Vector Machine Model on Local Entropy
2.6. Parameter Estimation Using the Monte Carlo Method
3. The Proposed Method for Detecting Encrypted Data and Clear Data from the Network
 Generating some confidence intervals that can be used to detect encrypted data and plain data;
 Proposing an algorithm that estimates the entropy value and the way to compute the standard deviation for the data that has to be classified, placing the data in the appropriate category based on the standard deviation belonging to the confidence intervals;
 Evaluating the proposed algorithm.
3.1. Generation of Confidence Intervals Used in the Detection of Encrypted Data and Data in the Clear From
 A sample containing only encrypted data used to determine the ${\Im}_{Cr}^{\sigma}$ range;
 The other sample containing only clear data used to determine the ${\Im}_{Cl}^{\sigma}$ interval.
3.2. Detection of Encrypted Data and Clear Data from the Network with the Help of Confidence Intervals
3.3. The Proposed Algorithm for Detecting Encrypted Data and Clear Data from the Network
Algorithm 1 Message Classification: Encrypted—DSCT—Unidentified—VTA 

3.4. Evaluation of the Proposed Algorithm for Detecting Encrypted Data and Clear Data in the Network
 Case One:
 Generation of a number of 10,000 messages, of which 5000 were made up of encrypted data and 5000 of DSCT type data;
 Case Two:
 Analysis of 5000 items from data flow, where the applications that generate the data traffic are known;
 Case Three:
 Analysis of 5000 items from data flow from general data traffic, without having knowledge about the applications that generate the traffic. This type of analysis can be adapted to have applicability in other fields as well, such as for the studies done in [15,16,17]. In this sense, collaborations were opened with researchers in the field.
3.5. Evaluation for Case One
 TVAD
 The total volume of analyzed data;
 CV
 Classification volume;
 PCT
 Percentage of correct detections for VTA proposed algorithm;
 DSCT
 Data structures transmitted over a communication channel in unencrypted format;
 True Positive e.
 True positive—encrypted (correct detection);
 True Positive c.
 True positive—in clear (correct detection);
 Encrypted False p.
 Encrypted false positives (actually: DSCT);
 False Positive c.
 False positives—clear (actually: encrypted);
 False Positive u.
 False positives—unidentified (actually: DSCT).
 DbScan:
 A clustering algorithm. For this case, the maximum detection percentage of the solution was taken into account for the ideal case of analyzed data. The data were taken from the tests presented in [9].
 LibSVM:
 A library of functions used in the detection of encrypted data. It is based on the support vector machine algorithms. The detection case for ideal situations was considered. The data were taken from the tests presented in [9].
 EnCod:
 A solution proposed in [11]. It considered the ideal data case, treated in the cited article.
 kNN:
 A solution proposed in [12]. Data for the ideal case were considered.
 SBE:
 A solution proposed in [13]. The data for the ideal case were considered, namely that of known data, more precise audio files, and the length of the data as specified in the study as optimal.
3.6. Evaluation for Case Two
 DbScan:
 The data were taken from the tests presented in [9].
 LibSVM:
 Results for data of known types were considered. The data were taken from the tests presented in [9].
 EnCod:
 This considered the data of known types, treated in [11].
 kNN:
 This considered the data of known types, treated in [12]. The authors used 3000 training packets and 700 test packets.
 SBE:
 This considered the data of known types, treated in [13].
3.7. Evaluation for Case Three
4. Limitations of the Proposed Model
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
IoT  Internet of Things 
DSCT  Data structures (messages) transmitted over a communication channel in unencrypted format 
ReLU  Rectified linear unit activation function 
SeLU  Scaled exponential linear unit activation function 
References
 GoubaultLarrecq, J.; Oivain, J. Detecting Subverted Cryptographic Protocols by Entropy Checking; Laboratoire Specification et Verification: Cachan, France, 2006. [Google Scholar]
 Wood, D.; Apthorpe, N.; Feamster, N. Cleartext Data Transmissions in Consumer IoT Medical Devices. In Proceedings of the 2017 Workshop on Internet of Things Security and Privacy (IoTS&P ’17), Dallas, TX, USA, 3 November 2017; pp. 7–12. [Google Scholar]
 Cha, S.; Kim, H. Detecting Encrypted Traffic: A Machine Learning Approach. In Proceedings of the 17th International Workshop (WISA 2016), Jeju Island, Korea, 25–27 August 2016. [Google Scholar]
 Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
 Dorfinger, P. RealTime Detection of Encrypted Traffic Based on Entropy Estimation. Master’s Thesis, Salzburg University of Applied Sciences, Salzburg, Austria, August 2010. [Google Scholar]
 Exfild, F.T.W. A Tool for the Detection of Data Exfiltration Using Entropy and Encryption Characteristics of Network Traffic. Master’s Thesis, University of Delaware, Newark, DE, USA, 2010. [Google Scholar]
 Malhotra, P. Detection of Encrypted Streams for Egress Monitoring. Master’s Thesis, Iowa State University, Ames, IA, USA, 2007. [Google Scholar]
 Casino, F.; Choo, K.K.R.; Patsakis, C. HEDGE: Efficient Traffic Classification of Encrypted and Compressed Packets. IEEE Trans. Inf. Forensics Secur. 2019, 14, 2916–2926. [Google Scholar] [CrossRef] [Green Version]
 Mamun, M.S.I.; Ghorbani, A.A.; Stakhanova, N. An An Entropy Based Encrypted Traffic Classifier. In Proceedings of the 17th International Conference on Information and Communications Security (ICISC 2015), Beijing, China, 9–11 December 2015; pp. 282–294. [Google Scholar]
 Zhou, K.; Wang, W.; Wu, C.; Hu, T. Practical evaluation of encrypted traffic classification based on a combined method of entropy estimation and neural networks. Etri J. Wiley 2020, 42, 311–323. [Google Scholar] [CrossRef] [Green Version]
 De Gaspari, F.; Hitaj, D.; Pagnotta, G.; De Carli, L.; Mancini, L.V. Reliable detection of compressed and encrypted data. Neural Comput. Appl. 2022, 34, 20379–20393. [Google Scholar] [CrossRef]
 Hahn, D.; Apthorpe, N.; Feamster, N. Detecting Compressed Cleartext Traffic from Consumer Internet of Things Devices. arXiv arXiv:1805.02722v1, 2018.
 Tang, Z.; Zeng, X.; Sheng, Y. Entropybased feature extraction algorithm for encrypted and nonencrypted compressed traffic classification. Int. J. Innov. Comput. Inf. Control 2019, 15, 845–860. [Google Scholar]
 Zhai, J.; Shi, H.; Wang, M.; Sun, Z.; Xing, J. An Encrypted Traffic Identification Scheme Based on the Multilevel Structure and Variational Automatic Encoder. Secur. Commun. Netw. 2020, 11, 1–10. [Google Scholar] [CrossRef]
 Lavinia, D.; Elisabeta, A.; Maria, T.; Mihail, P.; Calina, S.S. Contribution of mechanical and electrical cardiovascular factors in patients with ischemic stroke. Pak. J. Pharm. Sci. 2020, 33, 2455–2460. [Google Scholar]
 Calina, S.S.; Elisabeta, A.; Lavinia, D. The importance of balance and postural control in the recovery of stroke patients. Balneo Res. J. 2020, 11, 372–378. [Google Scholar]
 Dutescu, M.M.; Popescu, R.E.; Balcu, L.; Duica, L.C.; Strunoiu, L.M.; Alexandru, D.O.; Pirlog, M.C. Social Functioning in Schizophrenia Clinical Correlations. Curr. Health Sci. J. 2019, 44, 151–156. [Google Scholar]
 Acu, A.M.; Maduta, A.; Otrocol, D.; Rasa, I. Inequalities for Information Potentials and Entropies. Mathematics 2020, 8, 2056. [Google Scholar] [CrossRef]
 Acu, A.M.; Hodis, S.; Rasa, I. Estimates for the Differences of Certain Positive Linear Operators. Mathematics 2020, 8, 798. [Google Scholar] [CrossRef]
Message Type  TVAD  Classification  CV  PCT  DbScan  LibSVM  EnCod  kNN  SBE 

Encrypted  5000  True Positive e.  4895  $97.97\%$  $89.70\%$  $96.63\%$  $94\%$  $66.9\%$  $97.90\%$ 
True Positive c.  
Encrypted False p.  26  
False Positive c.  52  
False Positive u.  27  
DSCT  5000  True Positive e.  4902  
True Positive c.  
Encrypted False p.  24  
False Positive c.  22  
False Positive u.  52 
Message Type  TVAD  Classification  CV  PCT  DbScan  LibSVM  EnCod  kNN  SBE 

Encrypted  5000  True Positive e.  4588  $93.05\%$  $89.7\%$  $86.95\%$  $92\%$  $60\%$  $72\%$ 
True Positive c.  
Encrypted False p.  254  
False Positive c.  124  
False Positive u.  34  
DSCT  5000  True Positive e.  4717  
True Positive c.  
Encrypted False p.  119  
False Positive c.  72  
False Positive u.  92 
Message Type  TVAD  Classification  CV  PCT  DbScan  LibSVM  EnCod  kNN  SBE 

Encrypted  5000  True Positive e.  3051  $63.40\%$  $63\%$  $61\%$    $58.5\%$  $65\%$ 
True Positive c.  
Encrypted False p.  328  
False Positive c.  324  
False Positive u.  1297  
DSCT  5000  True Positive e.  3289  
True Positive c.  
Encrypted False p.  284  
False Positive c.  217  
False Positive u.  1210 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ticleanu, O.A.; Popa, T.; Hunyadi, D.I.; Constantinescu, N. Detecting Encrypted and Unencrypted Network Data Using Entropy Analysis and Confidence Intervals. Entropy 2023, 25, 397. https://doi.org/10.3390/e25030397
Ticleanu OA, Popa T, Hunyadi DI, Constantinescu N. Detecting Encrypted and Unencrypted Network Data Using Entropy Analysis and Confidence Intervals. Entropy. 2023; 25(3):397. https://doi.org/10.3390/e25030397
Chicago/Turabian StyleTicleanu, OanaAdriana, Teodora Popa, Daniel Ioan Hunyadi, and Nicolae Constantinescu. 2023. "Detecting Encrypted and Unencrypted Network Data Using Entropy Analysis and Confidence Intervals" Entropy 25, no. 3: 397. https://doi.org/10.3390/e25030397