A Survey on TLS-Encrypted Malware Network Trafﬁc Analysis Applicable to Security Operations Centers

: Recently, a majority of security operations centers (SOCs) have been facing a critical issue of increased adoption of transport layer security (TLS) encryption on the Internet, in network trafﬁc analysis (NTA). To this end, in this survey article, we present existing research on NTA and related areas, primarily focusing on TLS-encrypted trafﬁc to detect and classify malicious trafﬁc with deployment scenarios for SOCs. Security experts in SOCs and researchers in academia can obtain useful information from our survey, as the main focus of our survey is NTA methods applicable to malware detection and family classiﬁcation. Especially, we have discussed pros and cons of three main deployment models for encrypted NTA: TLS interception, inspection using cryptographic functions, and passive inspection without decryption. In addition, we have discussed the state-of-the-art methods in TLS-encrypted NTA for each component of a machine learning pipeline, typically used in the state-of-the-art methods.


Introduction
Since the last two decades, security operations centers (SOCs) can be found in multiple organizations (for example, enterprises, government, and universities), which are frequent targets of cybersecurity attacks by adversaries. As the focal point for various security operations and computer network-based defense, an SOC is typically a group comprising security experts, which conducts various security operations including detection, analysis, response, reporting, and prevention of cybersecurity incidents [1]. Despite their practical importance to organizations, especially in the last few years, there is only a fragmented and widespread literature focusing on various issues in SOCs [2,3].
Amongst a large set of services provided by SOCs, network intrusion monitoring, detection, and analysis (or network security monitoring [4,5]) are highly relevant to NTA. From an academic perspective, NTA is a branch that constitutes inferential methods to obtain traffic-related information of end hosts, users, application processes, and protocols from network traces (e.g., captured packets (in pcap format [6]), flow records (in NetFlow or IPFIX format [7,8]), and more). With the rapid growth of the Internet, various approaches in NTA have been investigated in different contexts, such as traffic engineering, network security, accounting, and advertising. Security experts in SOCs can achieve their goals, such as intrusion (or malware traffic) detection and traffic classification, using these methods. According to a survey conducted by ENEA Qosmos Division [9], 87% of security expert participants are familiar with NTA, and majority of their organizations already use NTA.
Recently, however, the majority of SOCs have been facing a critical issue with NTA: increasing adoption of traffic encryption on the Internet [10,11]. While the secure sockets layer (SSL), the predecessor to transport layer security (TLS), appeared in the middle of the 1990s to provide end-to-end communication privacy over the Internet, only 44.3% of web connections of European residential customers used HTTPS (a secure version of the HyperText Transfer Protocol using SSL/TLS) in September 2014 [12]. However, in October 2021, Google [13] and Let's Encrypt [14] reported that more than 80% of the web pages loaded in Chrome and FireFox browsers (which allows to sharing of usage statistics) used HTTPS, primarily owing to the push by major web browsers and community efforts such as Let's Encrypt [15] and HTTPS Everywhere [16]. Through these trends, existing NTA methods, relying primarily on application layer payload processing (for example, rule-based or signature-based intrusion detection, and deep packet inspection), lose their utility for encrypted traffic [17].
In this survey article, we present existing research on NTA and related areas primarily focusing on TLS-encrypted malware traffic, which can be utilized by security experts in SOCs. While there are multiple related surveys [18][19][20][21][22][23][24][25][26][27] available on NTA and traffic classification areas, our approach has the following distinguished contributions: • TLS is a widely used end-to-end encryption protocol with a wide variety of applications in diverse configurations [28]. Additionally, various malware families (especially Trickbot and Dridex) abuse TLS encryption [29,30], which is one of the biggest challenges faced by SOCs in recent years. Furthermore, the fraction of TLS-encryption that flows among malware flows is dramatically increasing: there were industrial reports in April 2021 stating that nearly a half of the malware uses TLS [31], and further in the second quarter of 2021, it stated that 91.5% of malware arrives over TLS-encrypted traffic [32]. As we have primarily discussed NTA methods applicable to malware detection and family classification, security experts in SOCs and researchers in academia can obtain useful information from our survey. • While various surveys only focus on comparison between existing methods, we also cover industrial and community efforts on the so-called TLS fingerprinting techniques. Similar to multiple data-driven approaches, the performance of NTA is directly related to the quality of the dataset. Fortunately, several open source threat intelligence (OSINT) feeds [33,34] now provide TLS fingerprint information. Therefore, through our discussion, better traffic analysis results can be achieved by integrating such information. Table 1 shows a comparison of this survey with other related surveys concerning encrypted NTA in terms of protocols, problem domains, and methods. Papadogiannaki et al. [18] has a comprehensive and up-to-date survey on encrypted NTA methods and their countermeasures, but it is considerably less detailed especially in the security domain. Pacheco et al. [19] surveyed machine learning-based (encrypted and unencrypted) traffic classification methods. However, it does not include the methods for encrypted malware traffic in recent years. To the best of our knowledge, Velan et al. [20] presented the first survey in encrypted NTA in 2015; however, considerable significant research has been executed in this area since. Aceto et al. [21] evaluated several existing deep learning-based methods using experiments focused on mobile applications. Conti et al. [22] provided a comprehensive survey on NTA methods for mobile devices where a majority of the methods capture the mobile traffic at mobile devices or at Wi-Fi access points. While some enterprise SOCs with wireless networks can utilize the methods discussed in [22], mobile device-specific and wireless link-specific features are not available in many SOCs which protect servers in wired networks. In this context, we focus on the NTA methods without link-specific features. Poh et al. [23] presented a survey on privacy-preserving inspection in middleboxes with only a slight coverage on machine learning techniques. Rezaei et al. [24] briefly overviewed deep learning-based methods for encrypted traffic classification. Additionally, Shen et al. [25] provided a brief overview of machine learning-based encrypted traffic classification; however, the primary topic of this article is feature selection and optimization for a website fingerprinting dataset. These surveys [24,25] address either machine learning-based or deep learning-based methods. While we have included comprehensive approaches with consideration of the security domain. Shbair et al. [26] described research efforts on services identification inside HTTPS, while de Carnavalet et al. [27] extensively introduced industry practices on TLS interception. In this paper, we have discussed the state-of-the-art methods applied in TLS-encrypted NTA, especially applicable to malware detection and family classification for SOCs. We believe that security experts in SOCs can understand both the industrial and academic trends on TLS-encrypted malware NTA from [27] and this article, respectively. The remainder of this paper is organized as follows. In Section 2, we provide several backgrounds, such as the goals of NTA for SOCs and the basics of SSL/TLS. Section 3 presents the three deployment models of NTA solutions: TLS interception without a private key, passive inspection using cryptographic functions, and inspection without decryption. Among the scenarios, we introduce several approaches in inspection without decryption in detail, considering the recent advances in the area. In Section 4, using an entire pipeline of machine learning-based analysis, we discuss the state-of-the-art methods in each component of the pipeline. Conclusions of this survey and future directions in TLS-encrypted NTA are drawn in Section 5.

Background
In this section, we introduce the background information.

Basics of SSL/TLS
SSL is the de facto standard Internet protocol used to establish secure end-to-end sessions over transmission control protocol (TCP) for providing communications privacy. According to the final draft of SSL 3.0 [35], SSL was designed to prevent several security attacks, such as eavesdropping, tampering, and message forgery. However, SSL 3.0 was deprecated in June 2015 [36].
TLS is the successor of SSL with backward compatibility with SSL, which was firstly published in RFC 2246 [37] in 1999. The current up-to-date version of TLS is TLS 1.3, defined in RFC 8446 [38]. However, a few of its previous versions (i.e., TLS 1.0 and TLS 1.1) were deprecated in March 2021 [39]. According to a report by Qualys SSL Labs [40] in October 2021, 99.6% of the surveyed websites support TLS 1.2 [41], and approximately half (49.7%) of the websites support TLS 1.3, while less than a half support TLS 1.0 and TLS 1.1. Therefore, unless explicitly specified, we have primarily focused on TLS 1.2 in this survey.
Following the TCP connection establishment procedure performed between two endpoints, a TLS session is initiated by one endpoint by sending a ClientHello message, and such endpoint is referred to as the client of the TLS session. The ClientHello message contains the client's TLS version, a list of supported ciphersuites (i.e., cryptographic options) ordered by preference of the client, and a list of requested extensions (i.e., extended functionality from servers), such as server name indication (SNI) extension defined in RFC 6066 [42]. Note that the HostName field in the SNI extension includes the fully qualified domain name system (DNS) hostname of the server (i.e., another endpoint), which can primarily be used for hosting multiple virtual servers, also known as virtual hosts, in an end host (for example, to enable request direction to an appropriate virtual server without decryption).
Next, the server responds with a ServerHello message containing the server's chosen ciphersuite among the client-offered ciphersuites, the server's chosen extensions among the client-requested extensions, a Certificate message containing a sequence (chain) of certificates for proving the identity of the server, and finally a ServerHelloDone message to indicate the end of the response. The client then verifies the authenticity of the given certificate chain. Once the ClientKeyExchange message is sent from the client, and ChangeCipherSpec and Finished messages are sent from both endpoints, the endpoints can exchange encrypted application data.

The Goals of Network Traffic Analysis for SOCs
While there are various types and sizes of SOCs, NTA is performed to achieve different purposes by security experts in SOCs whose role includes preliminary threat detection, triage of events, and incident response [43]. While in this paper we focus on malware detection and malware family classification only, we have listed three main goals of security experts in SOCs associated with network security monitoring as follows: • Malware detection: In malware (traffic) detection, NTA is used to detect network traffic containing various types of malicious content, or contributing to malicious applications. Traditionally, detection of malicious traffic is analyzed according to pre-configured rules for known attacks, but machine learning-based detection has been proposed as a complement of the signature-based network intrusion detection systems [44]. Malware detection methods typically utilize accumulated attack knowledge so that collecting and regularly updating the knowledge base is important in SOCs [2]. • (Network) Anomaly Detection: Network anomaly detection, or anomaly based intrusion detection is the problem to detect exceptional patterns in network traffic which can be distinguished from the expected normal network traffic pattern [45]. A broad range of anomaly detection techniques such as statistical, unsupervised, and rule-based techniques have been proposed in literature [46]. Furthermore, deep learning-based anomaly detection systems are actively discussed [47]. However, in real-world SOCs, the potential of human security experts may be more trusted than the automated methods so that some SOCs utilize or develop practical machine learning-based anomaly detection solutions combined with information visualization [48,49], which is out of our scope. • Application identification: NTAs for application identification identify the network traffic from particular applications, including unauthorized applications. This can be used for specific policy enforcement in SOCs (e.g., block Amazon traffic during work hours). Recently, especially for mobile traffic, there are several machine learningbased solutions where mobile application identification and even user actions can be identified [21,22], which is sometimes called user behavior analytics (UBA) in the context of SOCs [50]. While malware family classification can be seen as a variant of the conventional application identification problem, to the best of our knowledge, there is no NTA method to identify fine-grained behavior of malware from encrypted traffic.

The Deployment Models
To analyze encrypted traffic, including TLS-encrypted traffic, deployment models are widely used to deploy either, middleboxes for traffic interception, or traffic sensors for passive inspection in practice. We discuss three main deployment models for encrypted NTA: TLS interception, inspection using cryptographic functions, and passive inspection without decryption.

TLS Interception without Private Key
As described in Section 2.1, TLS was primarily designed to establish an encrypted endto-end session between the client and the server. However, in TLS interception, the client establishes an end-to-end TLS session with a middlebox (typically a TLS proxy in this context, but it may consist of more inspection-related functionalities, such as router, firewall, intrusion detection system (IDS)/intrusion prevention system (IPS), and content filter). As the middlebox is an endpoint of the TLS session, there is no hurdle to decrypt the application layer data. Hence, TLS interception transforms encrypted NTA into payload-based traffic analysis (also known as deep-packet inspection (DPI)), which is well established in the literature [51]. Thus, for TLS interception, TLS proxies and HTTPS proxies, which can be considered as a combination of TLS proxy and HTTP proxy, are widely deployed, especially in enterprise SOCs [27,52].
Notably, as shown in Figure 1 following the inspection and analysis, the middlebox forwards the data to the server via another end-to-end TLS session (i.e., with re-encryption) between the middlebox (on behalf of the client) and the server, where security policies can be enforced for the TLS traffic. This implies that although TLS is derived from a design motivated by the end-to-end argument [53], TLS interception breaks the end-toend security property of TLS, which incurs concerns about man-in-the-middle (MITM) attacks on TLS. Furthermore, both the client and server should trust the middlebox, or the middlebox should impersonate the server (e.g., with certificate delegation [54], or forged certificates [55]). As such, TLS interception has drawn attention in various debates [27,56,57].

Inspection Using Cryptographic Functions
The second category of the deployment models is inspection using cryptographic functions. We can further classify it into two sub-categories: TLS inspection with a private session key, and TLS inspection with searchable encryption [58][59][60].

TLS Inspection with a Private Key
In certain configurations of TLS, such as TLS 1.2 with Rivest-Shamir-Adleman (RSA)based ciphersuite, when the server shares the certificate private key with the middlebox as shown in Figure 2, TLS-encrypted traffic can be decrypted [27]. Similarly, Wireshark, a well-known network protocol analyzer, has a feature to decrypt TLS-encrypted traffic in the aforementioned configurations. Therefore, out-of-band (passive) TLS inspection is possible using a private key and appropriate configurations. However, this approach cannot be applied in other ciphersuites in TLS 1.2, such as Diffie-Hellman-based ciphersuites. In addition, TLS 1.3 does not support ciphers without forward secrecy, such as RSAbased ciphersuites. Additionally, de Carnavalet and van Oorschot [27] discussed static Diffie-Hellman key sharing use cases and issues in detail.

Privacy-Preserving Inspection through Searchable Encryption
While the previous deployment models have been popularized by the industry, the models are based on trusting middleboxes. However, the middleboxes with vulnerabilities can also be used by adversaries to compromise the privacy of the client and server [61]. According to Waked et al. [62], while the levels vary, all tested enterprise-grade TLS interception middleboxes are vulnerable.
Several studies have been conducted to enable privacy-preserving inspection with the help of searchable encryption techniques (e.g., [58]). For example, BlindBox [63] as shown in Figure 3 is a pioneering work on privacy-preserving deep packet inspection, based on a searchable encryption technique. While two privacy models are implemented, the common idea is for the client to transmit encrypted tokens generated from plaintext of a (unidirectional for simplicity) TLS session, to the middlebox through an out-of-band channel. Next, the middlebox attempts to perform deep packet inspection rule matching for the encrypted tokens, which is enabled by the searchable encryption technique. In addition, as the encrypted tokens could be different from the TLS-encrypted traffic, the receiver (which has the valid decryption key for the TLS session) cooperates with the middlebox (which should not have the key) by checking whether the receiver-decrypted tokens (forwarded from the middlebox) and the recovered plaintext from the TLS session are the same. The authors of BlindBox extend their system to support a wide range of middlebox services such as firewall, network address translation (NAT), HTTP proxy, and deep packet inspection [64]. Yuan et al. [65] proposed an architecture to perform private preserving deep packet inspection with a novel rule filter for achieving better performance than BlindBox. Ning et al. [66] utilize a reusable obfuscation mechanism for faster encrypted rule generation.
Although the idea of privacy-preserving inspection through searchable encryption is interesting and innovative, unfortunately, such approaches are less promising in the current generation of SOCs. At first, as compared with other deployment solutions, BlindBox and the following studies (e.g., [65,66]) exhibit poor performance. For instance, in BlindBox, given an IDS with typically 3000 rules, the required client-side time is 97 s. Furthermore, BlindBox requires an out-of-band channel with its own protocol in conjunction with TLS, which is an implausible assumption for malware; malware may use other encrypted channels to hide its malicious network behavior. Note that similar weakness can be found in recent inspection methods using cryptographic functions such as BlindIDS [67], IA2-TLS [68], P2DPI [69], etc., and other methods [70,71] relying on trusted execution environments, such as Intel SGX [72] in the middlebox, they are beyond the scope of this survey.

Client Server
Encrypted Token

Inspection without Decryption
The main motivation of this approach is that TLS-encrypted traffic itself exposes unencrypted metadata, and is equipped with other measurable properties (for example, packet length sequence and inter-arrival time sequence of a flow) that can be used to infer certain information related to the encrypted content.
While Papadogiannaki and Ioannidis [73] propose that the packet length sequence can be used in exact signature matching for encrypted traffic, and a few exact pattern matchingbased NTA methods can be found especially in TLS fingerprinting (Section 4.4), a majority of the solutions in this category adopt graphical, statistical, or machine learning algorithms.
A representative example in graphical methods is graphlet in BLINC [74]. Graphlet is a transport layer interaction pattern between hosts represented by a graph with the intent to identify the network application. In BLINC, heuristics are used for application classification with graphlet. Statistical methods have been discussed in the context of application protocol classification. For example, Velan et al. [75] compared flow-based, packet-based, and byte-based statistics and observes that flow-based statistics are more stable than others. While statistical methods have established a wide range of literature, the majority of the recent works discussed in malware detection and family classification employ machine learning techniques. For the machine learning-based methods, we have discussed their solutions in Section 4.
To avoid potential network performance degradation due to the on-the-path inspection using complex machine learning-based algorithms, various studies in this category implicitly assume that encrypted traffic is inspected off-the-path (for example, by traffic sniffing with switch port mirroring [76] or network taps, or flow record collection through NetFlow sensors [8,77]).
However, we should note that in general, the lightweight inspection without decryption can be deployed on-the-path, as shown in Figure 1. For instance, there are near real-time protocol identification solutions without decryption, such as iPoque's (acquired by Rohde and Schwarz) protocol and application classification engine (PACE) [78] and nDPI [79]. Similarly, there are several studies that identify application layer protocol with the first few packets only [80,81], even with encrypted traffic [82]. Nevertheless, such solutions primarily focus on protocol identification in the middlebox to forward network traffic to a protocol-specific proxy (e.g., TLS proxy in Section 3.1) for interception.

Machine Learning Pipeline for Passive Inspection of TLS-Encrypted Traffic
In passive inspection and analysis of TLS-encrypted traffic, it is more effective to describe a machine learning pipeline in advance, typically used in the state-of-the-art models. Figure 4 shows a visual representation of our summary for such a pipeline. Based on this pipeline, in this section, we have discussed the state-of-the-art methods in TLSencrypted NTA for each component of the pipeline.

Traffic Sniffing
Packet sniffers such as tcpdump can be used to collect TLS-encrypted traffic. Given features used in the machine learning pipeline, using packet sniffers with appropriate packet filters is desirable to reduce unnecessary packets dramatically, which further improves the performance of traffic analysis (e.g., throughput). For example, assuming a TCP segment contains a TLS message in its payload when the first byte of the TCP payload is 22, while the sixth byte is 1, the TLS message is a ClientHello message. Using this information, a security expert can manually extract the HostName field in the SNI extension contained in the ClientHello message.

Collecting Flow Records
Once the TLS-encrypted traffic is sniffed, collection of flow records should be conducted, as various machine-learning based methods assume that the raw input is a unidirectional/bidirectional TLS flow. A flow record may have various raw data for the corresponding flow and packets.
In practice, conventional flow records can be collected from the middleboxes (e.g., routers, switches), or software-based traffic sensors (e.g., nProbe), which can export NetFlow/IPFIX. While conventional NetFlow records have miniscule information for TLS, McGrew, and Anderson [83] proposed enhanced TLS flow records, which contain the sequence of packet lengths and (interarrival) times (SPLT), the byte distribution (BD) in the TLS flow data (an array keeping a count for each byte value in the packet payloads for the TLS flow), and TLS handshake metadata (features that can be collected in the TLS handshaking procedure described in Section 2.1). The enhanced flow records can be collected using Cisco joy [84], which is an open-source prototype of Cisco's encrypted traffic analytics (ETA) [11] and a precursor of Cisco mercury, while the current version of Cisco mercury does not support to collect some fields in the enhanced flow records.
Note, that while multiple TLS-encrypted NTA techniques utilize a subset of the enhanced TLS flow records in [83], there are several approaches to utilize high-level connection logs as flow records in bot detection [85,86] and malware family classification [87]. These approaches can be effective solutions for SOCs, because such information can be readily collected using conventional network security monitoring systems, such as Zeek (formerly known as Bro [88]).
However, it is unclear whether these researches [85][86][87] are applicable in TLS-encrypted traffic, as there has been no performance evaluation for TLS-encrypted traffic dataset.

Feature Extraction
While a flow record consists of detailed, but important information (i.e., features) on the corresponding flow, there can be less significant features compared to the other models. In some cases, it is better to represent, or summarize some features into a transformed feature to achieve certain goals (for example, interpretability of machine learning, and compact fingerprinting). Different naming, or division into several steps (e.g., feature extraction and feature selection) can be identified in the literature, but we refer to this procedure as feature extraction. Shen et al. [25] provides an appealing tutorial on feature extraction for encrypted traffic classification.
A flow record consists of raw data for the corresponding flow and packets. Such raw data may include the following types of features: • Variable-size sequential data type: TLS message type sequence, packet length sequence, interarrival time sequence, and time-slotted Zeek connection state log [86] have variable sizes, which is not suitable as an input for certain machine learning algorithms. There are several studies to transform variable-size data into statistical representative values (e.g., max, min, median, standard deviation, etc.) or a specific probabilistic/statistical object, such as a histogram and its self-similarity matrix [89], a first-order Markov chain [90], a second-order Markov chain [91], a hidden-Markov model [92], each of which can be represented as a finite-dimensional vector, while only statistical information remains in such models. Among these, Markov chain transformation has been widely used in TLS-encrypted traffic classification. Note that the approach in [89] has only been validated for unencrypted traffic; hence, we consider the adoption of the proposed approach into encrypted traffic under prospects for future work.
In contrast, there are several approaches to utilize machine learning algorithms, which allows variable-size input. FS-Net [93] proposes an end-to-end traffic classification model as a variant of the recurrent neural network (RNN), which allows the packet length sequence of a flow record to be an input. According to [93], FS-Net outperforms several Markov-chain based approaches in the true positive rate and the false positive rate. Shen et al. [94] proposed a novel graph-based representation of packet length sequences (with the direction of each packet between the client and the server), known as traffic interaction graph (TIG). This research also proposes a graph neural network, which can classify decentralized applications on Ethereum from TLS encrypted traffic. • Categorical data type: Each element of TLS client-offered ciphersuite list and TLS client-advertised extension list has a unique value with a finite number of cases, namely n, to allow better representation of a n bit vector using one-hot encoding, although the order information of the list would be lost. For example, Anderson and McGrew [17] observed that there are only 176 cases for each element in TLS client-offered ciphersuite list in their dataset. They also reported that applying orderpreserving representation on the list did not increase the performance significantly. • Numeric data type: There are several numeric data type fields in TCP header and TLS message header of each packet, and it is not necessary for such data to be transformed into other data types. • String data type: the HostName field in the SNI extension, the Certificate message in TLS handshaking, the subjectAltName field in the Certificate message and TLS flow data can be considered as string data. As each character has a unique value and a string has variable length, these data can be considered as variable-size sequential data types. In this context, the byte distribution in [83] can be observed as a histogram of TLS flow data. However, in various approaches such as [91], only the string length is extracted as a feature. In addition, ref. [17] reports that the mismatch between the subjectAltName field and the HostName field, if available, can be an effective feature for malware detection.

TLS Flow Fingerprinting
In practice, TLS fingerprint is an indicator of compromise (IoC) [95], which summarizes one or more TLS flow records with the same label, where the label has a dependency on its problem domain (e.g., malware/benign in malware detection [96], a specific malware family in malware family classification [29], mobile app in mobile traffic classification [21], and user agent string in browser fingerprinting [97,98]). Therefore, a TLS fingerprint can be used as a clarified input for machine learning algorithms in training/testing, as well as a model representation for a class (i.e., a specific label).
A widely used and active TLS fingerprinting method is JA3 [99]. This MD5 hashbased TLS client fingerprinting technique was proposed by John B. Althouse, Jeff Atkinson, and Josh Atkins of Salesforce in 2017 and named after its three authors with the same initials. A JA3 fingerprint summarizes SSL version, offering ciphersuite list, TLS extension list, and elliptic curve-related information in TLS ClientHello message of a TLS session with the MD5 hash function, while ignoring Google's GREASE (Generate Random Extensions And Sustain Extensibility) [100].
Currently, various practical TLS fingerprinting databases, such as mod_sslhaf [101] from Qualys SSL Labs, p0f [102] of Marek Majkowski, FingerPrinTLS [103] of Lee Brotherston, and JA3-based OSINT feeds [33,34] accumulate labeled TLS fingerprints in different goals (i.e., different types of labels), while all the databases are built for exact matching scenarios and tools. For example, ref. [99] recommends using JA3 for blacklist-based access control of TLS-encrypted malware traffic, and whitelist-based access control of legitimate applications in locked-down environments. However, despite its popularity, it is unclear whether JA3 is a reliable fingerprint for such scenarios, owing to the lack of evaluation results. To the best of our knowledge, ref. [104] is the only research on JA3's reliability. The authors insist that JA3 is not sufficient for mobile app identifications; however, a combination of JA3, JA3S, and SNI can improve reliability. Note that Kotzias et al. [105] reported 7.3% fingerprint collision in their longitudinal passive dataset while applying a client fingerprinting technique similar to JA3.
Meanwhile, in the literature, there are several proposals to adopt approximate, machine learning-based matching for TLS fingerprinting. Korczynski and Duda [90] propose using stochastic fingerprints for TLS-encrypted traffic in application classification. In this study, TLS message type sequences for each application to create a first-order homogeneous Markov chain fingerprint, and the classifier, is based on the maximum likelihood (ML) criterion. Inspired by Frolov and Wustrow [106], Anderson and McGrew [107] utilized the Levenshtein distance for approximate matching when exact matching failed, even though the approach exhibits a worse performance than exact matching. Nevertheless, approximate matching should be further studied, considering the evolution of TLS-encrypted traffic for the same label.
Additionally, Cisco joy and Cisco mercury provide the largest TLS fingerprint database labeled with potential (malicious or legitimate) application and operating system information, collected from malware sandbox and enterprise networks. However, it is not popularly adopted in the industry. In contrast, while there are multiple security tools and middleboxes to support JA3/JA3S in industry (e.g., FlowMon [108]) and security communities, its community database is relatively small. Hence, JA3cury [109] proposed a technique to translate each fingerprint record in mercury database into JA3.

Feature Representation
While features in a flow record or a TLS fingerprint can be used as a raw input to a machine learning algorithm, in some cases, it is better to further transform into being more machine learning friendly. For example, Anderson and McGrew [17] proposed contextual flows, which correlate a TLS-encrypted traffic with DNS flows and HTTP flows to enhance the performance of the machine learning classifier (especially the accuracy at a 0.00% false discovery rate). While [17] just combines the features of the TLS flow and the contextual flow, better feature representation of the feature set could enhance the performance of the machine learning classifier.
Recently, as discussed in Section 4.3, Shen et al. [94] proposed a graph-based representation called TIG to represent decentralized application flows. The representation clearly explains packet direction, packet length, packet burst, and packet ordering information to allow the GNN to extract such information.
Another recent advancement in this field is nPrint [110]. nPrint is a complete (i.e., every bit of a packet header is included), inherently normalized (for machine learning models), and aligned (i.e., each feature is always located at the same offset for every packet) packet representation. With this representation, automated machine learning (AutoML) systems can learn the importance of each feature without relying on manual feature engineering (which is heavily conducted in Anderson and McGrew [17]'s model). In [110], the authors successfully exhibited the per-bit feature importance visually, for several traffic analysis scenarios, such as active device fingerprinting, passive OS detection, and browser and app identifications.

Machine Learning Algorithms and Model Selection
Various machine learning algorithms are available owing to extensive studies in this field. Once the feature representation is completed, various the algorithms are readily applied in TLS-encrypted traffic. For example, Anderson and McGrew [96] provide a detailed comparison for malware detection among several well-known machine learning models: linear regression, logistic regression, support vector machine (SVM), decision tree, random forest, and multilayer perceptron (MLP), given a set of extensive dataset engineered by security experts. Furthermore, the researchers also considered the possibility of noisy labels. According to their work, random forest is the most robust machine learning classifier for malware detection.
When several machine learning algorithms are required to be evaluated, along with a comparison for model selection, we are required to employ performance metrics. Accuracy, precision, recall, and F1-score are the typically used metrics in encrypted NTA. One interesting metric especially proposed for SOCs is accuracy at a 0.00% false discovery rate (FDR), appeared in [17,83,96]. Note that FDR is defined as the expectation of a false positive/(false positive + true positive)) [111] and performance evaluation with controlling FDR has been widely used in statistics and genomics. In contrast, while an exception [112] can be found in traffic classification literature, controlling FDR was rarely conducted in malware detection literature. As we can observe in its definition, FDR highly depends on false positives. Clearly, an incident response team in a SOC may not conduct machine learning-based methods if too many false positives occur, and a recent research [113] conduct an online survey to understand SOC analysts' perspective on this issue in depth. Thus, we can conduct feature selections for each machine learning algorithm to control the FDR. In this context, ref. [17,83] successfully justified the necessity for combining TLS metadata, SPLT, and BD feature sets to achieve better accuracy at a 0.00 % FDR, where the accuracy at a 0.00% FDR refers to the accuracy in the controlled trial.
In contrast, an increasing trend can be observed in the research efforts to conduct NTA without security experts with the assistance of deep learning. Rezaei and Liu [24] introduced a set of deep learning-based methods for traffic classification. Similarly, Aceto et al. [21] provide an excellent systematic framework for comparison of deep learning architectures for mobile encrypted traffic classification.
Meanwhile, with the advance of AutoML systems, model selection can be automated. In nPrintML [110], AutoGluon-Tabular [114] is used, which trains, optimizes, and tests over 50 machine learning models, such as tree-based methods, deep neural network models, and neighbor-based classification models.

Hyperparameter Tuning
In machine learning, hyperparameter tuning is described as the process to determine the right combination of hyperparameters for a machine learning algorithm. For example, in Anderson and McGrew [96], a simple grid search over a set of standard values is performed for a cross-validation dataset. However, as highlighted in [21], the hyperparameter tuning of machine learning algorithms for encrypted traffic classification is substantially overlooked in literature. As a solution, Holland et al. [110] recently proposed nPrintML, a system to automate feature extraction and hyperparameter tuning, designed for various NTA tasks.

Conclusions
In this survey article, we discuss several TLS-encrypted NTA methods and their deployment models in the context of malware detection and family classification for security experts in SOCs. We observe that while TLS interception is widely used in industry, the rise of privacy issues leads for researchers and some vendors to recommend inspection without decryption. Another approach to utilize searchable encryption is promising, but the current generation of SOCs has no incentive to deploy such solutions owing to performance issues and an implausible assumption for malware. Thus, we discuss the state-of-theart methods which are suitable for SOCs which inspect TLS-encrypted traffic without decryption, focusing on the machine learning-based methods. Especially, we emphasize the current trend in TLS fingerprinting in industry and academia, which can be helpful for security experts who intend to introduce machine learning-based methods in SOCs.
While a substantial number of studies have been conducted in this field, including some groundbreaking works in recent years, there is still room for further improvement as follows: • The existing proposals have been validated in different and small datasets. While lack of diverse, large, and sharable datasets with labels is a persistent problem in NTA [115], sharing TLS fingerprints in OSINT feeds seems to be relatively plausible. Thus, designing OSINT-friendly TLS fingerprinting techniques with more features optimized for machine learning-based NTA can be a promising research direction. • With the fast adoption of TLS 1.3, visibility of TLS-encrypted traffic using TLS interception is rapidly decreasing in many SOCs, even though the enhanced flow records are collected. It is because that in TLS 1.3, many features in TLS handshake metadata are no longer collectible due to inherent secure design. It implies that more features in TLSencrypted traffic should be collected with novel feature representations, well-designed machine learning algorithms, and model optimization techniques, under the diverse constraints of SOCs (privacy, cost, automation, scalability, etc.). Recent advances in deep learning-based NTA can be a potential research direction. • The current academic literature lacks consideration in real-time and online processing for NTA. Considering the higher requirements of deep learning-based methods, we may need to be aware of systematic and holistic approaches in NTA.