Next Article in Journal
WebRTC Swarms: Decentralized, Incentivized, and Privacy-Preserving Signaling with Designated Verifier Zero-Knowledge Authentication
Previous Article in Journal
Adaptive Privacy-Preserving Insider Threat Detection Using Generative Sequence Models
Previous Article in Special Issue
Emotional Sequencing as a Marker of Manipulation in Social Media Disinformation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identity Leakage in Encrypted IM Call Services: An Empirical Study of Metadata Correlation

Department of Information Management, Central Police University, Taoyuan 333322, Taiwan
Future Internet 2026, 18(1), 12; https://doi.org/10.3390/fi18010012
Submission received: 11 November 2025 / Revised: 14 December 2025 / Accepted: 24 December 2025 / Published: 26 December 2025
(This article belongs to the Special Issue Information Communication Technologies and Social Media)

Abstract

Instant messaging (IM) applications are ubiquitous, and while end-to-end encryption protects message content, traffic metadata remains observable. This paper proposes a traffic correlation framework for IM call services under a passive ISP-level threat model to infer communication parties from encrypted traffic. The framework extracts and matches metadata from sustained, bidirectional call flows and jointly analyzes endpoint identifiability, shared server connectivity, symmetry in call duration and traffic volume, and service type indicators to derive correlation artifacts for matching. The framework is instantiated and evaluated on WhatsApp, Facebook Messenger, and Snapchat across diverse user behavior scenarios and commonly deployed network settings. Experimental results show that the method reliably links caller and callee flows, revealing edges in users’ social graphs without decrypting any packets. Under typical data retention regimes, these findings indicate that metadata-based correlation provides a practical basis for deanonymization and represents a persistent privacy risk for users of IM calling.

Graphical Abstract

1. Introduction

Instant messaging (IM) applications have become deeply embedded in modern digital life [1,2], offering real-time messaging, voice, and video call services that have reshaped user interaction. Although end-to-end encryption protects the content of communications, the interactive nature of these applications generates observable network traffic patterns that may compromise user anonymity [3,4,5,6,7,8,9,10,11].
Of particular concern are voice and video call services, which establish sustained and time-dependent traffic flows. Unlike text-based messaging, which typically relies on server-mediated relay mechanisms that decouple the timing of sender and receiver traffic, call sessions involve continuous, real-time signaling and media exchanges. These bidirectional flows create distinctive fingerprints that are potentially observable by network intermediaries. Given that governments and telecommunication carriers widely implement data retention regimes for law enforcement and commercial purposes [12,13,14,15,16,17,18,19], such mandates ensure that traffic metadata is persistently stored and available for analysis. Consequently, by leveraging these historical records to correlate traffic patterns, an adversary can infer communication relationships and reconstruct social networks without decrypting any packets.
Existing research has primarily focused on traffic classification or digital forensics on end devices [4,5,20,21,22,23,24,25,26,27,28,29]. However, few studies have systematically addressed the threat of user-pair correlation from the perspective of an ISP-level observer, particularly in the context of modern encrypted IM calls.
To address this gap, this paper proposes a traffic correlation framework to evaluate the feasibility of identity leakage in IM call services under a passive ISP-level threat model. The proposed framework exploits the sustained and bidirectional nature of call flows to systematically extract and match traffic patterns. To achieve this, six analysis dimensions, including endpoint identifiability, common server connectivity, call duration symmetry, traffic volume consistency, service type characteristics and signaling artifacts, are defined to construct a robust correlation mechanism.
The proposed framework is evaluated on three major IM applications, including WhatsApp, Facebook Messenger, and Snapchat, across diverse user scenarios and network architectures. Experimental results in a heterogeneous environment demonstrate that the derived metadata signatures exhibit sufficient distinctiveness to reliably correlate caller and callee flows. The main contributions of this paper are summarized as follows:
  • This paper formulates a realistic ISP-level threat model to assess privacy risks in IM call services, explicitly accounting for adversary capabilities alongside practical real-world challenges, including encrypted communication, massive user scale, and cross-network synchronization.
  • A systematic analysis framework is proposed that synergistically integrates six analysis dimensions, including endpoint identifiability, common server connectivity, service type characteristics, and symmetries in call duration and traffic volume, and signaling artifacts to effectively correlate encrypted call sessions.
  • Robust empirical evidence, derived from extensive experiments analyzing bilateral traffic records across heterogeneous network architectures, diverse usage scenarios involving role-swapping, and multiple major IM applications, demonstrates that metadata analysis can reliably reveal communication relationships and edges in user social networks, highlighting a persistent privacy vulnerability under current data retention regimes.
The remainder of this paper is organized as follows. Section 2 reviews related work on privacy leakage in IM traffic, focusing on how network traffic analysis, traffic classification, data retention practices, and the threat model contribute to user privacy exposure. Section 3 presents the research methodology, including the experimental setup and data collection. Section 4 reports the analysis results and evaluates the degree of privacy leakage across different IM applications and call scenarios. Finally, Section 5 concludes the paper by summarizing the key findings and outlining directions for future research.

2. Related Works

2.1. Network Traffic Analysis and Classification

Network traffic analysis has long been employed for monitoring and understanding communication behavior. Traffic monitoring protocols, such as NetFlow, J-Flow, sFlow, OpenFlow, and IP Flow Information Export (IPFIX) [20,30], evolved from simple logging utilities into critical tools for network management, security analysis, and user profiling. Flow-based monitoring methods primarily capture metadata, including packet counts, flow durations, and header information, rather than the content of communication. To complement these approaches, deep packet inspection (DPI) emerged as a distinct technique capable of examining packet payloads [31,32]. The fundamental distinction lies in the depth of analysis: while flow-based technologies are restricted to header information, DPI provides visibility into the actual application data.
The increasing adoption of encryption has introduced significant challenges for traffic analysis [25]. As packet payloads become inaccessible, DPI techniques relying on content lose effectiveness. Consequently, researchers have increasingly focused on metadata-based approaches, where flow-level features such as packet length distributions and inter-arrival times remain observable despite encryption. Recent studies apply statistical modeling and machine learning to classify encrypted flows, using features such as packet length distributions, inter-arrival times, and burst patterns [23,25].
However, the ability to classify traffic types does not automatically imply the feasibility of correlating specific communicating pairs. In real-world ISP networks, this correlation capability is strictly constrained by the immense volume of data traffic. From the perspective of data retention regimes, the scale of traffic that must be logged is a critical factor; the larger and more fine-grained the retained dataset, the higher the cost and the more difficult it becomes to perform reliable message-based correlation in practice. To illustrate the magnitude of this problem for IM services, we next quantify the traffic volume generated by message exchanges using WhatsApp as an example.
Public statistics indicate that WhatsApp processes more than 100 billion messages per day worldwide, which corresponds to an average of approximately 69.4 million messages per minute. In the United States, the service has an estimated 100 million monthly active users [33]. Assuming that message traffic is roughly proportional to user distribution, and based on our preliminary measurements indicating that a single text message can generate four distinct sessions, this would result in approximately 9.26 million WhatsApp-related sessions per minute in the U.S. alone. Given this magnitude, even when IM traffic can be reliably detected, re-identifying specific communication pairs remains a formidable challenge due to the immense volume of message exchanges. These figures underscore the scale and intensity of message-based IM traffic, making it extremely difficult to associate individual senders with their corresponding receivers in practice, thereby limiting the potential of message-based traffic for reliable re-identification.
In contrast to message-based traffic, call services within IM applications exhibit continuous and time-dependent flows. The data exchanges generated during a call are temporally aligned between the communicating parties. Furthermore, during a voice or video call, the caller transmits media traffic that is received by the callee in real time, while the callee simultaneously transmits media traffic that is received by the caller. As a result, the media flows in both directions can exhibit similar characteristics, leading to a degree of symmetry between the two communication parties. Such properties make correlation considerably more feasible than in message traffic, highlighting call services as a distinct target for privacy analysis.

2.2. Data Retention Practices

In criminal investigations and counterterrorism applications, law enforcement agencies (LEAs) are often required to deploy nationwide monitoring systems across Internet service provider (ISP) infrastructures, spanning both mobile and fixed networks, to achieve broad surveillance coverage of millions of users. Accordingly, LEAs not only rely on traffic metadata records to identify types of user traffic but also analyze additional characteristics of the collected data to further narrow down potential suspects.
Denmark was among the earliest countries to introduce legislation mandating the retention of traffic metadata records. The law, enacted in 2006, required session logging to preserve communication metadata [34]. However, after several years, the government identified problems with both the implementation by Internet service providers and the limited investigative utility of the retained data. As a result, the legislation was repealed in 2014 [12]. The Danish framework specified two implementation options: retaining the first and last packet of each session or sampling every 500th packet of a user’s communication at the network boundary [34]. Major Danish carriers favored the latter due to its simpler data collection, lower costs, and reduced storage requirements, despite concerns about potential data loss. The legislation also required that retained records include source and destination IP addresses, port numbers, transport protocol information, and timestamps [34].
However, the legislation did not adequately account for the widespread deployment of carrier-grade network address translation (CG-NAT), which allows thousands of users to share the same IP address simultaneously, with addresses potentially changing every few minutes or even seconds [35]. This technology is extensively used in both mobile and fixed networks. When traffic metadata records are collected at network boundaries, the captured data often undergoes IP address and port translation, preventing LEAs from reliably associating specific subscribers with the services they access using only source and destination IP addresses [35].
In addition, Denmark’s requirement for per-user sampling presented further challenges. Implementation often encompassed all user traffic at a single collection point, potentially aggregating data from thousands of users [35]. This made reconstructing an individual’s activity difficult, as critical events could be omitted. Furthermore, the sampling scheme was based on total data volume rather than user activity, meaning that the inclusion of images or videos in messages increased the sampled data size without a corresponding rise in message count. This accelerated data accumulation compared to activity-based sampling and potentially compromised the accuracy and effectiveness of investigations [35].
The United Kingdom (UK) has also legislated the retention of traffic metadata records. In 2016, the Investigatory Powers Act (IPA) was enacted, introducing a retention framework known as the Internet Connection Record (ICR) [36]. Compared to Denmark’s earlier scheme, the UK framework was presented as an improvement, offering a more flexible arrangement with carriers, a more representative collection mechanism, and the retention of information considered sufficient for investigative purposes. Furthermore, the UK’s scheme stipulates that retained ICRs must be linked to subscriber accounts, thereby enhancing their utility for law enforcement [35]. The UK government also broadened the scope of data retention beyond the provisions set forth in Denmark’s framework. Core ICR fields include a customer account reference, source and destination IP addresses and ports, and the start and end time of the session [37]. Additional information may include data volume, the name of the accessed Internet service or server, and elements of a URL that constitute communications data, typically the domain name [37]. Although the legislation was passed in 2016, the first trial of ICR retention was not conducted until 2019. The trial, implemented with smaller telecommunication carriers, aimed to evaluate whether an operationally efficient system could be developed and whether the retained data were sufficient, accurate, and necessary for investigations [38]. These developments suggest that large-scale retention of network traffic metadata could enable the observation of IM usage and amplify privacy risks, particularly in call services.
With the large-scale retention of network traffic metadata, it may be possible to identify the use of IM applications directly from network vantage points without relying on service provider logs [39]. However, message-based traffic in IM applications typically exhibits short, bursty patterns that are intermingled with other behavioral traffic, such as reading received messages or background synchronization, which limits the effectiveness of large-scale identity correlation and makes the reliable identification of communicating parties highly challenging. In contrast, call services generate sustained and time-dependent flows that directly involve both communicating parties. Such sustained flows create a distinct vector for bilateral correlation, thereby exposing communication edges and bypassing user anonymity. This underscores the necessity of a systematic investigation into the privacy vulnerabilities inherent in IM call service metadata.

2.3. Encrypted Traffic Analysis and Privacy Leakage

While encryption protects payload confidentiality, substantial research has demonstrated that network traffic detection and data classification can still be achieved by analyzing unencrypted metadata. Early approaches relied on statistical fingerprints, such as packet length distributions and inter-arrival times, to identify application protocols [26]. With the advancement of artificial intelligence, researchers have increasingly adopted Machine Learning (ML) and Deep Learning (DL) algorithms to enhance classification accuracy. Standard ML algorithms, including Random Forest (RF) and C4.5 decision trees, have been widely applied to distinguish between different types of encrypted traffic [26,40,41,42,43,44]. More recently, DL models such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks have been utilized to automatically extract complex spatiotemporal features from traffic flows, achieving high accuracy in application identification even without payload inspection [26,45,46]. To further address the challenge of data scarcity and rapid app updates, meta-learning techniques have emerged as a promising direction. By enabling models to adapt quickly to new tasks with limited examples, also known as few-shot learning, meta-learning has proven effective in handling the dynamic nature of encrypted traffic [47,48,49,50].
In real-world applications of IM, these techniques are primarily used for traffic classification and behavioral fingerprinting. Existing studies use statistical and machine learning models to isolate IM protocols from heterogeneous traffic or to infer granular user actions, such as typing or initiating calls, based on metadata features like packet sizes and burst volumes [3,6,10,24,25,51]. However, application-level identification yields limited discriminatory power regarding user identity. Due to the sheer scale of these platforms, merely confirming that a user is active on a popular application does not compromise their anonymity, as they remain statistically hidden within a massive pool of concurrent subscribers.
In contrast, this study targets a far more critical vulnerability: the reconstruction of pairwise communication relationships. By shifting the focus from single-endpoint classification to user-pair correlation, this paper investigates whether the inherent symmetry of IM call flows can be exploited to link callers and callees. Unlike classification, successful correlation directly links two endpoints, thereby effectively collapsing the anonymity set and exposing private social connections. Conducting such bilateral analysis necessitates simultaneous visibility over both communicating parties. Since this global visibility is inherently available to Internet Service Providers (ISPs), a rigorous examination of the threat from this specific vantage point is essential. Section 2.4 formalizes this threat model and specifies the analyst’s capabilities and constraints.

2.4. Threat Model

Building on the characteristics of network traffic analysis and the data retention regimes discussed earlier, a critical concern arises regarding whether the use of IM services, particularly call services, might inadvertently compromise user privacy despite the protection provided by end-to-end content encryption. This concern is particularly salient when contrasting traditional public switched telephone network (PSTN) and mobile telephony with modern IM applications. In traditional telephony, operators inherently possess the identities of both communicating endpoints via call detail records (CDRs) for billing and routing. In contrast, IM users often operate under the assumption that the decoupling of the application layer from the network layer, combined with encryption, shields their communication patterns from network intermediaries. Whether this assumption holds in practice against sophisticated metadata analysis is the central question of this study.
To address this, we formulate a threat model to assess whether metadata from IM call services can be exploited to reveal communication pairs and, by extension, reconstruct the social networks of users. We assume a passive network observer with ISP-level visibility, representative of entities such as law enforcement agencies or state-sponsored operators acting within legal frameworks that enable systematic metadata collection. To provide a structured analysis, the discussion of this threat model proceeds in three parts: adversary capabilities, practical constraints, and attack goal.

2.4.1. Adversary Capabilities

We define the adversary as an entity with ISP-level visibility and capacity. This model assumes that the adversary can obtain network traffic records from the service providers of both the caller and the callee. In practice, this corresponds to access to flow-level logs, such as NetFlow and IPFIX, as well as analysis results produced by DPI or lawful interception (LI) systems [13,14,15]. Under this assumption, the adversary is able to observe, at a minimum, the following attributes:
  • 5-tuple information: The adversary observes the source and destination IP addresses, source and destination ports, and the transport protocol.
  • Timing and volume: The adversary records precise timestamps for flow start and end times, flow durations, and volumetric statistics, including packet or byte counts.
  • User-IP mapping logs: CG-NAT generally obscures user identities behind shared public IP addresses [52], ISPs maintain internal mapping logs linking subscribers to transient IP addresses for commercial and operational purposes, specifically for data usage billing. Law enforcement agencies can request these records through legal processes to resolve the subscriber identity associated with a specific public IP and port combination.
  • Unencrypted packet content: While application payloads are generally encrypted, the adversary can inspect any unencrypted packet content using ISP-deployed DPI tools or lawful interception infrastructures [53,54,55]. This capability is not limited to basic headers but includes any unencrypted artifacts, such as DNS queries, Server Name Indication (SNI), or Session Traversal Utilities for NAT (STUN) attributes, that may reveal service types or assist in identifying communication endpoints.

2.4.2. Practical Challenges for the Adversary

To evaluate the robustness of the proposed threat, we consider several practical limitations in real-world networks that may hinder traffic correlation. These challenges serve as conditions for determining whether the threat model is valid:
  • Encrypted signaling and payload: Consistent with Section 2.1, communication content is end-to-end encrypted. Consequently, IM call signaling is also encrypted, preventing the adversary from retrieving call party information directly from packet payloads.
  • Massive user base and traffic volume: As detailed in Section 2.2, platforms such as WhatsApp have an estimated 100 million monthly active users in the United States alone. The immense volume of users and their generated traffic poses a formidable challenge for re-identification. This massive scale of concurrent events makes isolating specific caller-callee pairs akin to finding a needle in a haystack.
  • Cross-network time alignment: Callers and callees frequently traverse different ISP networks or are recorded by different logging nodes. Clock skews, network jitter, and imperfect synchronization introduce timestamp discrepancies between vantage points. Because perfect alignment of flow start and end times cannot be assumed, the adversary must tolerate temporal uncertainty when attempting to match caller and callee sessions.
  • Targeted retention regime: Reflecting the privacy perspectives in Section 2.3, the adversary may operate under targeted retention rather than indiscriminate bulk retention. In such settings, high-granularity logging is activated only for specific subscribers, IP ranges, time windows, or services, rather than stored for all users indefinitely. A central question addressed by this study is whether significant privacy leakage still persists under these constrained observation conditions.

2.4.3. Attack Goal

The primary objective of the adversary within this framework is to correlate traffic. Despite practical challenges, including encryption, massive traffic volumes, cross-network time alignment discrepancies, and the absence of full traffic retention, the adversary attempts to exploit deterministic traffic characteristics, specifically temporal alignment and volumetric symmetry, to link the caller and callee. If successful, such correlation would bypass user anonymity and enable the reconstruction of social networks relying solely on traffic metadata. To empirically evaluate the feasibility of this attack and assess the extent of information leakage, Section 3 details our experimental methodology, outlining the data collection process and the specific analysis dimensions used to characterize these metadata patterns.

3. Methodology

This paper presents a traffic correlation framework to evaluate privacy leakage in IM call services via metadata analysis. Unlike conventional single-point classification, this study introduces a bilateral approach that correlates traffic features from both communicating endpoints. Validated through experiments in heterogeneous networks, the methodology is structured into three parts: the experimental setup (Section 3.1), the data collection procedure (Section 3.2), and a multi-dimensional analysis model (Section 3.3).

3.1. Experimental Setup

To realistically emulate scenarios involving users engaged in call activities, all experiments were conducted using physical smartphones to capture authentic network traffic. The testbed comprised two devices, a Google Pixel 6 and a Pixel 6 Pro, selected for their continued software support and upgradeability to Android 16. This ensures compatibility with current mobile applications and enhances the relevance of the experimental setup for real-world analysis.
The Pixel 6 was rooted with Magisk v29.0 to enable the installation of Android tcpdump, allowing direct traffic capture at the device network interface, particularly during 4G mobile network connections. This configuration reflects the typical usage scenario in which most users rely directly on mobile networks, while also enabling traffic to be captured as it traverses the operator’s infrastructure. In contrast, the Pixel 6 Pro remained unrooted to represent a typical end-user environment. Both devices were installed with three widely used IM applications. These applications were selected because they are globally popular platforms that provide call services, ensuring that the collected traffic reflects realistic and representative usage scenarios.
The experimental network architecture is illustrated in Figure 1. The rooted Pixel 6 was connected directly to the Internet through a 4G mobile network (Operator A) supporting both IPv4 and IPv6. In contrast, the Pixel 6 Pro accessed the Internet via a Wi-Fi hotspot provided by a Windows 11 laptop, which was connected to a fixed broadband network (Operator B) that only supported IPv4. The laptop was equipped with Wireshark, a widely adopted packet analyzer, to log all transmitted traffic. During the experiments, the two smartphones alternated between caller and callee roles to simulate typical VoIP call scenarios in IM applications. Table 1 summarizes the hardware specifications, software tools, and configurations used in the experimental environment. Table 2 lists the IM applications evaluated in this study, including their version numbers and supported call services.

3.2. Call Behavior Scenarios and Experimental Procedure

To systematically analyze traffic variations under different communication behaviors, five representative call scenarios were defined to capture differences in user responses and call termination dynamics. These scenarios reflect common usage conditions in VoIP call services and are described as follows:
  • Caller-initiated cancellation: The callee does not answer, and the caller manually cancels the call.
  • Callee-initiated rejection: The callee does not answer and manually rejects the incoming call.
  • System timeout termination: The callee does not respond, and the system automatically terminates the call after a timeout period.
  • Caller-terminated conversation: The callee answers the call, and the caller terminates the call.
  • Callee-terminated conversation: The callee answers the call, and the callee terminates the call.
These scenarios were executed under controlled conditions to ensure repeatability and comparability of results. To minimize interference from background network activity, each experiment was preceded by a short idle period. Because 4G networks typically employ CG-NAT and Wi-Fi networks use local network address translation (NAT), both public and private IPv4 addresses, as well as IPv6 addresses, were recorded for each device to ensure unique endpoint identification throughout the experiments.
  • Launch the IM application on caller device and open the chat interface.
  • Maintain both devices in an idle state for several seconds to minimize background traffic.
  • Initiate synchronized traffic capture on both the caller and callee devices. If excessive background traffic is detected, restart the capture to ensure data integrity.
  • Execute the designated behavioral scenario according to the experimental plan.
  • Stop the traffic capture and store the traffic for subsequent analysis.
To ensure the reliability and consistency of the observations, each behavioral scenario was executed 3 times under identical conditions. Considering the comprehensive experimental design, which encompasses 3 applications, 2 service modalities (voice and video), 2 network directions (LTE-to-Fixed and Fixed-to-LTE), and 5 call scenarios, this resulted in a total dataset of 180 experimental trials.

3.3. Multi-Dimensional Correlation Framework

To evaluate the captured traffic systematically, this study defines a multi-dimensional correlation framework comprising six specific analysis dimensions. These dimensions are designed to capture the invariant characteristics of VoIP traffic that persist despite encryption and network heterogeneity.
  • Endpoint Identifiability: This dimension investigates the visibility of legitimate endpoint IP addresses exposed during connection establishment. Since communicating endpoints often reside behind restrictive network environments, such as mobile networks employing CG-NAT or Wi-Fi networks behind standard NAT, IM applications must employ protocol negotiation to establish connectivity. The analysis assesses whether identifiers are disclosed during this negotiation process, which serves as a primary vector for leakage, or within the resulting traffic topologies. Disclosure, whether inherent in peer-to-peer sessions or inadvertent in server-relayed traffic, enables the direct mapping of network identifiers to physical subscriber identities.
  • Server Connectivity: This dimension analyzes concurrent connections to shared relay infrastructure, serving as a critical alternative when direct endpoint identification is unattainable. Since call initiation mandates simultaneous signaling and media sessions from both the caller and callee to specific application servers, monitoring these synchronized connections enables the detection of active call status. By correlating subscribers who establish concurrent sessions to the same server IP address, this metric significantly reduces the anonymity set, thereby isolating potential communicating pairs from the vast background traffic pool.
  • Call Duration Symmetry: This dimension quantifies the temporal alignment of flow start and end times between the caller and callee. This metric serves as a vital filter to resolve the ambiguity presented by concurrent connections to shared relay servers. Given the synchronized nature of signaling and media exchanges inherent to real-time communication, the start and end timestamps of the paired flows exhibit high correlation. Despite minor temporal variations introduced by network jitter or signaling delays, this strong temporal consistency is crucial for isolating the true communication pair from a large anonymity set.
  • Traffic Volume Symmetry: This dimension evaluates the volumetric consistency between the traffic transmitted by the caller and received by the callee, and vice versa. This analysis provides a critical second filter to further refine the ambiguity remaining after temporal correlation. Given the bidirectional exchange of media, packet counts and byte volumes between the paired flows are expected to be approximately equivalent. Although network factors such as MTU differences, fragmentation, or packet loss introduce minor discrepancies, the overall volumetric balance provides a robust signature for confirming bilateral flow correlation.
  • Service Type Characteristics: This dimension differentiates the traffic patterns associated with voice versus video modalities. Distinctions in traffic volume and behavioral signatures between service types allow for more granular fingerprinting and assessment of privacy risks.
  • Signaling Artifacts: This dimension extracts supplementary protocol-specific patterns, such as variations in user behavior or application-specific signaling features (e.g., STUN usernames), found in unencrypted phases. These artifacts provide irrefutable evidence that further strengthens potential privacy implications.
The following section presents the experimental results according to these six dimensions, allowing the captured traffic to be examined consistently across different call scenarios and IM applications.

4. Experiment Results

This section presents the analysis of network traffic generated by call services in WhatsApp, Facebook Messenger, and Snapchat. The captured traces are examined along the six analysis dimensions defined in Section 3.3. The objective is to characterize network behaviors observed under different call scenarios and to identify potential indicators of privacy leakage in VoIP metadata.

4.1. Endpoint Identifiability

In the caller-initiated cancellation, callee-initiated rejection, and system timeout termination scenarios, the IP addresses of both parties remain undisclosed when using WhatsApp and Snapchat. However, in the case of Facebook Messenger, both the public and private IP addresses of the caller are observable within the callee’s captured traffic under the same scenarios. Figure 2 illustrates an example where both IP addresses of the Facebook Messenger caller are exposed in the callee’s traffic.
In the caller-terminated and callee-terminated conversation scenarios, several distinct differences were observed among the three applications concerning endpoint identifiability.
For WhatsApp, when the caller utilized a mobile network, the communication frequently established a direct peer-to-peer (P2P) connection, resulting in the disclosure of both public and private IP addresses. Conversely, when the WhatsApp caller utilized a Wi-Fi connection, the communication typically reverted to a server-relay mechanism, leading to the successful concealment of IP address information from both parties. Figure 3 illustrates this vulnerability, displaying the full leakage of the caller’s public and private IP addresses (indicated by the red and green rectangles, respectively) upon connection establishment under the mobile network scenario.
In the case of Facebook Messenger, the disclosure of the public and private IP addresses of both communicating parties was consistently observed across all established scenarios, irrespective of the underlying mobile or Wi-Fi network. Even when the main media traffic was routed through application servers, protocol-specific packets (such as STUN) still revealed endpoint information during the call phase. Figure 4 exemplifies this characteristic, demonstrating how the public and private IP addresses of the communicating parties are disclosed even within the server-relayed traffic flow (indicated by the red rectangle).
In the case of Snapchat, only the public IP addresses of the two communicating parties were observable. Similar to Facebook Messenger, packets in Snapchat also typically revealed endpoint information during the call phase, even though the main traffic was relayed through application servers. In some tests, direct peer-to-peer traffic between the parties was also detected. Figure 5 illustrates two corresponding transmission modes: (a) direct traffic between communicating parties, and (b) server-relayed communication in which public IP addresses are still leaked.
Table 3 summarizes the endpoint identifiability results across the five call scenarios for the three IM applications, including the trial metrics and findings.

4.2. Server Connectivity

Many network sessions were observed while operating IM applications. Because millions of users may simultaneously connect to the servers of major IM platforms, background traffic from unrelated users can interfere with network-level analysis. To reduce this interference, the analysis in this section focuses on User Datagram Protocol (UDP) sessions, which display consistent and distinctive patterns during call activities. In addition, through multiple experimental trials, the sessions most strongly associated with call activities were identified and used as the primary focus for subsequent analysis.
Across all observed scenarios, a clear set of UDP-based behaviors was identified. The STUN protocol was widely used in all three IM applications. STUN messages enable devices behind NAT to determine their public IP addresses and port numbers, thereby facilitating peer-to-peer connectivity. Because STUN packets are commonly transmitted over UDP port 3478, they serve as reliable indicators of call initiation and signaling. In addition to signaling, UDP traffic carries real time media streams once a call is established. As discussed in Section 4.1, most cases of endpoint information disclosure were associated with STUN or other UDP packets, confirming the focus on UDP traffic for server connectivity analysis.
The measurements conclusively show that, across all five call scenarios, the caller and callee in the three IM applications consistently connected to the same server or server cluster during call sessions. Table 4 illustrates that WhatsApp, Facebook Messenger, and Snapchat all employ centralized server infrastructures through which both participants maintain active, simultaneous connections while a call is ongoing. Furthermore, for WhatsApp, Facebook Messenger, and Snapchat, at least three distinct servers were typically observed on UDP, respectively. It should be noted that minor variations in these counts may arise depending on the specific network conditions of the caller and callee at the time of the call.
In addition, an extra server with the IP address 31.13.87.54 and the hostname edge-turnservice-shv-01-tpe1.facebook.com was identified in Facebook Messenger on UDP port 40003, connecting both call parties and likely supporting supplementary signaling or media exchange. These counts represent unique server IPs observed from our vantage during call sessions and remained consistent across repeated runs. Furthermore, additional observations revealed that when Mobile Device 1 used Facebook Messenger or Snapchat, the majority of traffic was transmitted through IPv6 servers.

4.3. Call Duration Symmetry

In general, the experiments show that at least one session on the caller and callee sides exhibited approximately matching durations, which can be used to infer the relationship between the communicating parties. In some instances, within the caller-terminated and callee-terminated conversation scenarios, a peer-to-peer connection was established between the caller and callee, providing a second identifiable condition, as the session durations on both sides were nearly identical. The following examples describe how these patterns were observed across three different IM applications.
In the case of WhatsApp, this pattern typically involved three sessions. As shown in Figure 3, the caller’s connections to three servers lasted 47.5085 s, 47.5086 s, and 47.5086 s, while the callee’s corresponding sessions each lasted 46.5801 s. A direct peer-to-peer traffic transmission was also observed in this case, with the session durations recorded as 7.0237 s for the caller and 6.8717 s for the callee. In some cases, only one pair of sessions with similar durations was observed.
In the case of Facebook Messenger, numerous sessions were generated during each call. The analysis revealed that most traffic was relayed through different IP addresses for the caller and callee. However, when these IP addresses were resolved to their corresponding hostnames, consistent naming patterns were identified. As shown in Figure 6, after sorting the session durations in descending order, the hostnames associated with the longest sessions displayed structural similarities between the caller and callee. Although the IP addresses differed, the corresponding hostnames followed a predictable pattern, primarily differing by their IPv4 and IPv6 designations. For example, the caller and callee were connected to servers with hostnames such as edge-turnservices6-shv-01-tpe1.facebook.com and edge-turnservice-shv-01-tpe1.facebook.com, or edgeray-msgr6-shv-01-xxx1.facebook.com and edgeray-msgr-shv-01-xxx1.facebook.com. Table 5 lists the key observed server hostnames and their corresponding IP addresses for Facebook Messenger.
Repeated experiments confirmed that when the mobile network supported IPv6, Facebook Messenger preferentially used IPv6-based servers for call relay. Consequently, to identify the call duration of Facebook Messenger users, the analysis should primarily focus on the session durations associated with these groups of IPv4 and IPv6 relay servers (highlighted by the blue rectangle in Figure 6). The session durations of the caller were approximately 62 s, while those of the callee were around 61 s. Additionally, direct peer-to-peer transmission sessions lasted 48.444 s on the caller’s side and 48.8681 s on the callee’s side.
In the case of Snapchat, this pattern typically involved at least one session. As shown in Figure 5a, the caller’s connection to a server lasted 39.781 s via the IPv6 address 2406:da1a:91:9700:2b64:ae7f:5c6c:936b on port 443, while the callee’s corresponding session lasted 38.3985 s via the IPv4 address 13.200.139.250 on the same port (indicated by the rectangle). Both IP addresses are hosted by Amazon Web Services (AWS). At the same time, a direct peer-to-peer transmission was observed, with the session duration recorded as 32.0886 s for the caller and 31.9996 s for the callee. In Figure 5b, the server-relayed traffic indicates that the caller’s session lasted 92.4029 s, and the callee’s sessions to three servers lasted 90.9662 s (indicated by the rectangle), respectively.
The following section presents the statistical results for duration symmetry, which relies on quantifying the temporal alignment between the caller and callee sessions. We define the absolute and normalized call duration differences used in this analysis. Assume D C is the duration observed at the caller and D L is the duration observed at the callee. The absolute difference T a b s measures the actual time offset
T a b s = D C D L
The normalized difference T n o r m measures the difference relative to the total session duration, using the sum of the durations as the basis for normalization:
T n o r m = D C D L D C + D L
Table 6 summarizes the statistical results of call duration differences across the three IM applications, presenting the Median (Interquartile Range, IQR) on the first line, followed by the [Minimum, Maximum] range on the second line. The analysis of the Absolute difference demonstrates strong temporal symmetry across most scenarios. Specifically, the median absolute difference for WhatsApp and Snapchat consistently remained below 1.4 s. However, Facebook Messenger (Device 2) exhibited significantly higher variability, with median absolute differences reaching 2.493 s for voice calls and 2.030 s for video calls.

4.4. Traffic Symmetry

The following section presents the results for traffic symmetry. The analysis relies upon comparing the total traffic volume captured at both communication endpoints. The relevant metrics are defined based on the sum of transmitted Tx and received Rx traffic at each party. Assume VC represents the total volume (packets or kilobytes) observed at the caller, and VL represents the total volume observed at the callee, where V = Tx + Rx.
The absolute difference Vabs quantifies the actual volume offset:
V a b s = V C V L
The normalized difference Vnorm quantifies the difference relative to the total traffic volume, utilizing the sum of the volumes as the basis for normalization:
V a b s = V C V L V C + V L
In pre-conversation states, which include the caller-initiated cancellation, callee-initiated rejection, and system timeout termination scenarios, the observed sessions generally exhibited asymmetric traffic volume. This asymmetry is primarily attributed to the lack of an established direct, bidirectional communication channel for media exchange between the two parties. Figure 7 illustrates an example of this asymmetric traffic pattern observed in the system timeout termination scenario, using WhatsApp as the representative application. Conversely, in the established conversation scenarios (caller-terminated and callee-terminated), traffic volume symmetry became evident. The statistical results quantifying this symmetry are summarized in Table 7 and Table 8. Table 7 presents the statistical summary of traffic volume differences (in packets) for established conversations (Scenarios 4 and 5), with the data presented as the Median (IQR) on the first line, followed by the range [Min, Max] on the second line. Similarly, Table 8 provides the statistical summary of traffic volume differences measured in kilobytes (KB) for the same established conversation scenarios, also presenting the Median (IQR) followed by the Range [Min, Max]. This clear symmetry in both packet counts and byte volume strongly reinforces the efficacy of traffic symmetry as a robust filtering mechanism for correlating communication pairs.

4.5. Service Type

In the caller-initiated cancellation, callee-initiated rejection, and system timeout termination scenarios, subtle differences were observed between the traffic of voice calls and video calls in the case of WhatsApp users. For example, as shown in Figure 7, three sessions were connected to the servers on both the caller and callee sides. However, the total number of packets and bytes across these three sessions was identical for the voice caller, while noticeable variations were present in the voice callee, video caller, and video callee cases. In contrast, the other IM applications did not exhibit significant differences between voice and video modalities under the same three scenarios.
In the caller-terminated and callee-terminated conversation scenarios, however, the overall traffic volume showed a clear distinction between voice calls and video calls. As illustrated in Figure 8, the video call sessions generated substantially higher traffic volumes than the voice call sessions in both directions.

4.6. Signaling Artifacts

During the experiments, in addition to the common characteristics discussed above, the three IM applications exhibited distinct behavioral patterns, further revealing the characteristics of user activity. The following sections describe these observations for each application individually.
  • WhatsApp
Across all scenarios involving the caller and the callee, three sessions were typically established with three servers using UDP port 3478. Among these sessions, two displayed nearly identical total packet counts (both to and from the server). However, a distinct asymmetry was observed in these sessions: the caller consistently transmitted a higher number of packets than the callee. Specifically, the caller’s packet count was generally in the double digits (e.g., 25 packets), whereas the callee’s was in the single digits (e.g., 5 packets). This characteristic is clearly observable in the rectangular area highlighted in Figure 7.
2.
Facebook Messenger
Unlike WhatsApp, Facebook Messenger presents greater challenges in distinguishing between caller and callee roles based solely on the packet count asymmetry described previously. However, it exhibits a distinct signaling artifact that facilitates correlation. Across all observed scenarios involving the caller and the callee, packets containing identical STUN Binding Request User IDs were consistently detected in the traffic traces of both endpoints. As illustrated in Figure 9, the specific User IDs (e.g., STQp and HB3x) appeared in the captured traffic of both the caller and the callee. This shared identifier provides a deterministic mechanism to correlate the two communication endpoints, thereby directly revealing the relationship between the caller and the callee.
3.
Snapchat
Snapchat exhibited the same STUN binding request user IDs in both directions only in the caller-terminated and callee-terminated conversation scenarios. As shown in Figure 10, the same user IDs (etWfrBXcT37Sr2fy and vODwOvco+uiqxGec) appeared in packets from both the caller and the callee. This characteristic indicates that the two communication endpoints can be correlated, revealing the relationship between the caller and the callee.

4.7. Discussion

The preceding experiments analyzed 6 critical metadata dimensions: endpoint identifiability, server connectivity, call duration, traffic approximate symmetry, service type differentiation, and unique protocol characteristics. The results conclusively demonstrate that all three major IM applications contain several exploitable characteristics that could compromise user anonymity.
First, concerning endpoint identifiability, all three IM applications exhibited varying degrees of IP address disclosure, particularly within the caller-terminated and callee-terminated conversation scenarios. This vulnerability was most pronounced in Facebook Messenger, where the caller’s public and private IP addresses were consistently exposed in the callee’s traffic, even in instances where the call was not answered or successfully established.
Regarding server connectivity, the analysis confirmed a consistent finding across all five call scenarios: the caller and callee in the three IM applications invariably connected to the same centralized server or server cluster during their sessions. The ubiquitous use of UDP port 3478 across these platforms is a significant characteristic that can be leveraged by a network observer to reliably determine whether a user is engaging in real-time call services.
The overall traffic analysis revealed high temporal and volumetric symmetry in most call sessions among the three IM applications. Furthermore, the capacity for service type differentiation is strong. The traffic volume differences between voice and video calls in established conversation scenarios were highly distinct, as video calls consistently generated significantly higher traffic loads.
In addition, further protocol-level analysis revealed that the three IM applications each contained unique identifying features, such as the presence of identical STUN binding request user IDs. These protocol features can either directly identify the caller or be utilized to correlate both communicating parties with high confidence.
These findings provide compelling empirical evidence that the metadata generated by current IM applications remains highly identifiable, creating significant privacy vulnerabilities that may expose both user identities and their communication activities. Despite the security provided by end-to-end encryption for content, these IM applications continue to leak identifiable metadata that enables adversaries to reconstruct communication networks and infer user relationships, posing a critical and unresolved privacy threat.

5. Conclusions

This study systematically investigates identity-related leakage in IM call services by analyzing endpoint observability, server connectivity, call duration, and traffic characteristics. By generating and recording traffic traces at both caller side and callee side for three widely used IM applications, the results show that identity-related leakage can arise across all call service scenarios, regardless of the specific platform. In some cases, the caller or callee IP address is directly observable. In others, distinctive and temporally aligned traffic patterns appear at both sides, enabling inference of communication partners. Such metadata characteristics allow an observer to infer relationships between communicating parties. When collected at scale, these inferences can be aggregated to reconstruct edges in users’ social graphs and may facilitate linkage to real world identities, thereby weakening the anonymity and privacy expectations that IM applications are intended to provide.
The 2022 decision of the Court of Justice of the European Union, which prohibits indiscriminate data retention but permits targeted retention under restricted conditions such as defined user groups, geographic areas, or limited time periods [19], suggests a policy trend toward selective metadata retention. Although such policies aim to balance law enforcement needs and privacy protection, many governments and telecommunication carriers already retain network connection records in practice [13,14,15,16]. Meanwhile, rapid advances in artificial intelligence enable large-scale traffic analysis and behavioral profiling, further amplifying privacy risks. These observations motivate the need for stronger privacy protections that mitigate metadata-based identity exposure in modern communication systems. This study provides empirical evidence and design guidance to inform IM application providers. In particular, reducing identity exposure from call metadata will require architectural improvements so that privacy objectives are met in practice.
To address this continuing challenge, future work will expand the empirical evaluation to a broader range of device models, operating systems, and IM applications to further assess the generality of the observed metadata patterns.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. David Curry Messaging App Revenue and Usage Statistics (2025)-Business of Apps. Available online: https://www.businessofapps.com/data/messaging-app-market/ (accessed on 27 September 2025).
  2. Laura Ceci Mobile Messenger and Communication Apps-Statistics & Facts. Available online: https://www.statista.com/topics/1523/mobile-messenger-apps/?srsltid=AfmBOoryHFFyL16fq6uj4CCHLhvPmSbHBVe7O8DWe-7_xygcZk0CVy8I#topicOverview (accessed on 29 September 2025).
  3. Karpisek, F.; Baggili, I.; Breitinger, F. WhatsApp Network Forensics: Decrypting and Understanding the WhatsApp Call Signaling Messages. Digit. Investig. 2015, 15, 110–118. [Google Scholar] [CrossRef]
  4. Tsai, F.-C.; Chang, E.-C.; Kao, D.-Y. WhatsApp Network Forensics: Discovering the Communication Payloads behind Cybercriminals. In Proceedings of the 2018 20th International Conference on Advanced Communication Technology (ICACT), Chuncheon, Republic of Korea, 11–14 February 2018; IEEE: New York, NY, USA, 2018; pp. 679–684. [Google Scholar]
  5. Ahmed, W.; Shahzad, F.; Javed, A.R.; Iqbal, F.; Ali, L. WhatsApp Network Forensics: Discovering the IP Addresses of Suspects. In Proceedings of the 2021 11th IFIP International Conference on New Technologies, Mobility and Security, NTMS 2021, Paris, France, 19–21 April 2021. [Google Scholar] [CrossRef]
  6. Afzal, A.; Hussain, M.; Saleem, S.; Shahzad, M.K.; Ho, A.T.S.; Jung, K.H. Encrypted Network Traffic Analysis of Secure Instant Messaging Application: A Case Study of Signal Messenger App. Appl. Sci. 2021, 11, 7789. [Google Scholar] [CrossRef]
  7. Rathi, K.; Karabiyik, U.; Aderibigbe, T.; Chi, H. Forensic Analysis of Encrypted Instant Messaging Applications on Android. In Proceedings of the 6th International Symposium on Digital Forensic and Security, ISDFS 2018-Proceeding, Antalya, Turkey, 22–25 March 2018; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2018; Volume 2018, pp. 1–6. [Google Scholar]
  8. Zhang, H.; Chen, L.; Liu, Q. Digital Forensic Analysis of Instant Messaging Applications on Android Smartphones. In Proceedings of the 2018 International Conference on Computing, Networking and Communications (ICNC), Maui, HI, USA, 5–8 March 2018; IEEE: New York, NY, USA, 2018; pp. 647–651. [Google Scholar]
  9. Choi, J.; Yu, J.; Hyun, S.; Kim, H. Digital Forensic Analysis of Encrypted Database Files in Instant Messaging Applications on Windows Operating Systems: Case Study with KakaoTalk, NateOn and QQ Messenger. Digit Investig. 2019, 28, S50–S59. [Google Scholar] [CrossRef]
  10. Keshvadi, S.; Karamollahi, M.; Williamson, C. Traffic Characterization of Instant Messaging Apps: A Campus-Level View. In Proceedings of the Conference on Local Computer Networks, LCN, Sydney, Australia, 16–19 November 2020; IEEE Computer Society: Sydney, Australia, 2020; Volume 2020, pp. 225–232. [Google Scholar]
  11. Coull, S.E.; Dyer, K.P. Traffic Analysis of Encrypted Messaging Services. ACM SIGCOMM Comput. Commun. Rev. 2014, 44, 5–11. [Google Scholar] [CrossRef]
  12. IT-Political Association of Denmark Published Written Evidence (IPB0051) of Investigatory Powers Bill. Available online: http://data.parliament.uk/WrittenEvidence/CommitteeEvidence.svc/EvidenceDocument/Science and Technology/Investigatory Powers Bill Technology issues/written/25190.html (accessed on 25 June 2025).
  13. Sandvine Must Make Good on Its Commitments and Stop Harming Human Rights. Available online: https://www.accessnow.org/press-release/joint-letter-to-sandvine-on-announced-reforms/ (accessed on 30 November 2025).
  14. Bill, M.; Jakub, D.; Sarah, M.; Adam, S.; John, S.-R.; Ron, D. BAD TRAFFIC: Sandvine’s PacketLogic Devices Used to Deploy Government Spyware in Turkey and Redirect Egyptian Users to Affiliate Ads? Available online: https://citizenlab.ca/2018/03/bad-traffic-sandvines-packetlogic-devices-deploy-government-spyware-turkey-syria/ (accessed on 30 November 2025).
  15. U.S. Blocklists Sandvine for Enabling Digital Repression in Egypt. Available online: https://www.accessnow.org/press-release/us-blocklists-sandvine-for-digital-repression-in-egypt/ (accessed on 30 November 2025).
  16. Global Deep Packet Inspection (DPI) Industry Report 2024-2030: DPI Deployments Soar amid Growing Emphasis on Network Security. Available online: https://finance.yahoo.com/news/global-deep-packet-inspection-dpi-083600291.html (accessed on 30 November 2025).
  17. A Massive Database of 8 Billion Thai Internet Records Leaks. Available online: https://techcrunch.com/2020/05/24/thai-billions-internet-records-leak/ (accessed on 27 September 2025).
  18. The UK Home Office Home Office Report on the Operation of the Investigatory Powers Act 2016 (Accessible Version). Available online: https://www.gov.uk/government/publications/report-on-the-operation-of-the-investigatory-powers-act-2016/home-office-report-on-the-operation-of-the-investigatory-powers-act-2016-accessible-version#chapter-2-review-outcomes (accessed on 19 February 2025).
  19. Joined Cases C-793/19 and C-794/19 SpaceNet and Telekom Deutschland ECLI:EU:C:2022:702 (‘SpaceNet’). Available online: https://curia.europa.eu/juris/document/document.jsf?text=&docid=265881&pageIndex=0&doclang=EN&mode=req&dir=&occ=first&part=1&cid=623107#Footnote* (accessed on 25 June 2025).
  20. Hofstede, R.; Čeleda, P.; Trammell, B.; Drago, I.; Sadre, R.; Sperotto, A.; Pras, A. Flow Monitoring Explained: From Packet Capture to Data Analysis with NetFlow and IPFIX. IEEE Commun. Surv. Tutor. 2014, 16, 2037–2064. [Google Scholar] [CrossRef]
  21. Cherukuri, A.K.; Ikram, S.T.; Li, G.; Liu, X. Encrypted Network Traffic Analysis; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
  22. Naboulsi, D.; Fiore, M.; Ribot, S.; Stanica, R. Large-Scale Mobile Traffic Analysis: A Survey. IEEE Comm. Surv. Tutor. 2016, 18, 124–161. [Google Scholar] [CrossRef]
  23. Lin, P.; Ye, K.; Hu, Y.; Lin, Y.; Xu, C.Z. A Novel Multimodal Deep Learning Framework for Encrypted Traffic Classification. IEEE/ACM Trans. on Netw. 2023, 31, 1369–1384. [Google Scholar] [CrossRef]
  24. Pathmaperuma, M.H.; Rahulamathavan, Y.; Dogan, S.; Kondoz, A.M. Deep Learning for Encrypted Traffic Classification and Unknown Data Detection. Sensors 2022, 22, 7643. [Google Scholar] [CrossRef]
  25. Shen, M.; Ye, K.; Liu, X.; Zhu, L.; Kang, J.; Yu, S.; Li, Q.; Xu, K. Machine Learning-Powered Encrypted Network Traffic Analysis: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2023, 25, 791–824. [Google Scholar] [CrossRef]
  26. Papadogiannaki, E.; Ioannidis, S. A Survey on Encrypted Network Traffic Analysis Applications, Techniques, and Countermeasures. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
  27. Alblooshi, A.; Aljneibi, N.; Iqbal, F.; Ikuesan, R.; Badra, M.; Khalid, Z. Smartphone Forensics: A Comparative Study of Common Mobile Phone Models. In Proceedings of the 12th International Symposium on Digital Forensics and Security, San Antonio, TX, USA, 29–30 April 2024. [Google Scholar] [CrossRef]
  28. Sarhan, S.A.E.; Youness, H.A.; Bahaa-Eldin, A.M.; Taha, A.E. VoIP Network Forensics of Instant Messaging Calls. IEEE Access 2024, 12, 9012–9024. [Google Scholar] [CrossRef]
  29. Prabowo, W.A.; Mohsen, F.; Selamat, S.R. WhatsApp Mobile Applications in the Lens of Digital Forensics: Deciphering the Msgstore.Db.Crypt14 File. J. Cyber Secur. Mobil. 2025, 14, 823–848. [Google Scholar] [CrossRef]
  30. Trammell, B.; Boschi, E. An Introduction to IP Flow Information Export (IPFIX). IEEE Commun. Mag. 2011, 49, 89–95. [Google Scholar] [CrossRef]
  31. Finsterbusch, M.; Richter, C.; Rocha, E.; Müller, J.A.; Hänßgen, K. A Survey of Payload-Based Traffic Classification Approaches. IEEE Commun. Surv. Tutor. 2014, 16, 1135–1156. [Google Scholar] [CrossRef]
  32. Sherry, J.; Lan, C.; Ada Popa ETH Zürich, R.; Berkeley Sylvia Ratnasamy, U. BlindBox: Deep Packet Inspection over Encrypted Traffic. In Proceedings of the ACM SIGCOMM Computer Communication Review 45, Coimbra, Portugal, 8–11 September 2025. [Google Scholar] [CrossRef]
  33. WhatsApp User Statistics 2025: How Many People Use WhatsApp? Available online: https://backlinko.com/whatsapp-users (accessed on 28 September 2025).
  34. Danish Administrative Order for Data Retention (Logningsbekendtgørelsen). Available online: https://www.retsinformation.dk/eli/lta/2006/988 (accessed on 25 June 2025).
  35. Comparison of Internet Connection Records in the Investigatory Powers Bill with Danish Internet Session Logging Legislation. 2016. Available online: https://assets.publishing.service.gov.uk/media/5a81b29840f0b62302698b3c/Comparison_of_ICRs_with_Danish_Session_Logging.pdf (accessed on 25 June 2025).
  36. Investigatory Powers Act 2016. 2016. Available online: https://www.legislation.gov.uk/ukpga/2016/25/contents (accessed on 25 June 2025).
  37. Operational Case for the Retention of Internet Connection Records. 2015. Available online: https://assets.publishing.service.gov.uk/media/5a751224e5274a3cb28696be/Operational_Case_for_the_Retention_of_Internet_Connection_Records_-_IP_Bill_introduction.pdf (accessed on 25 June 2025).
  38. Annual Report of the Investigatory Powers Commissioner 2019. 2020. Available online: https://ipco-wpmedia-prod-s3.s3.eu-west-2.amazonaws.com/IPC-Annual-Report-2019_Web-Accessible-version_final.pdf (accessed on 25 June 2025).
  39. Investigatory Powers Bill Factsheet–Internet Connection Records; The UK Government: London, UK, 2015.
  40. Wang, Q.; Yahyavi, A.; Kemme, B.; He, W. I Know What You Did on Your Smartphone: Inferring App Usage over Encrypted Data Traffic. In Proceedings of the 2015 IEEE Conference on Communications and Network Security, Florence, Italy, 28–30 September 2015; pp. 433–441. [Google Scholar] [CrossRef]
  41. Taylor, V.F.; Spolaor, R.; Conti, M.; Martinovic, I. Robust Smartphone App Identification via Encrypted Network Traffic Analysis. IEEE Trans. on Infor. Foren. Secur. 2018, 13, 63–78. [Google Scholar] [CrossRef]
  42. Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapé, A. Multi-Classification Approaches for Classifying Mobile App Traffic. J. Netw. Comput. Appl. 2018, 103, 131–145. [Google Scholar] [CrossRef]
  43. Wang, S.; Chen, Z.; Zhang, L.; Yan, Q.; Yang, B.; Peng, L.; Jia, Z. TrafficAV: An Effective and Explainable Detection of Mobile Malware Behavior Using Network Traffic. In Proceedings of the 2016 IEEE/ACM 24th International Symposium on Quality of Service, Beijing, China, 20-21 June 2016. [Google Scholar] [CrossRef]
  44. Wijesekera, P.; Baokar, A.; Hosseini, A.; Egelman, S.; Wagner, D.; Beznosov, K. Android Permissions Remystified: A Field Study on Contextual Integrity. In Proceedings of the 24th USENIX Security Symposium 2015, Washington, DC, USA, 12–14 August 2015; pp. 499–514. [Google Scholar]
  45. Xu, S.; Sen, S.; Morley Mao, Z. CSI: Inferring Mobile ABR Video Adaptation Behavior under HTTPS and QUIC. In Proceedings of the 15th European Conference on Computer Systems, EuroSys 2020, Heraklion, Greece, 27–30 April 2020; p. 16. [Google Scholar] [CrossRef]
  46. Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A.; Lloret, J. Network Traffic Classifier with Convolutional and Recurrent Neural Networks for Internet of Things. IEEE Access 2017, 5, 18042–18050. [Google Scholar] [CrossRef]
  47. He, Q.; Moayyedi, A.; Dan, G.; Koudouridis, G.P.; Tengkvist, P. A Meta-Learning Scheme for Adaptive Short-Term Network Traffic Prediction. IEEE J. Sel. Areas Commun. 2020, 38, 2271–2283. [Google Scholar] [CrossRef]
  48. Yang, C.; Xiong, G.; Zhang, Q.; Shi, J.; Gou, G.; Li, Z.; Liu, C. Few-Shot Encrypted Traffic Classification via Multi-Task Representation Enhanced Meta-Learning. Comput. Netw. 2023, 228, 109731. [Google Scholar] [CrossRef]
  49. Hu, Y.; Wu, J.; Li, G.; Li, J.; Cheng, J. Privacy-Preserving Few-Shot Traffic Detection Against Advanced Persistent Threats via Federated Meta Learning. IEEE Trans. Netw. Sci. Eng. 2024, 11, 2549–2560. [Google Scholar] [CrossRef]
  50. Zhao, J.; Li, Q.; Hong, Y.; Shen, M. MetaRockETC: Adaptive Encrypted Traffic Classification in Complex Network Environments via Time Series Analysis and Meta-Learning. IEEE Trans. Netw. Serv. Man. 2024, 21, 2460–2476. [Google Scholar] [CrossRef]
  51. Feng, Y.; Li, J.; Mirkovic, J.; Wu, C.; Wang, C.; Ren, H.; Xu, J.; Liu, Y. Unmasking the Internet: A Survey of Fine-Grained Network Traffic Analysis. IEEE Commun. Surv. Tutor. 2025, 27, 3672–3709. [Google Scholar] [CrossRef]
  52. ETSITR 103 829-V1.1.1; Lawful Interception (LI); IP Address Retention and Traceability. ETSI: Sophia Antipolis, France, 2022.
  53. Lawful Interception. Available online: https://netquestcorp.com/lawful-intercept/ (accessed on 6 December 2025).
  54. Lawful Interception Solutions. Available online: https://www.ips-intelligence.com/en/lawful-interception (accessed on 6 December 2025).
  55. Bąkowski, P. Access to Data for Law Enforcement: Lawful Interception; European Parliament: Brussels, Belgium, July 2025. [Google Scholar]
Figure 1. Experimental network architecture.
Figure 1. Experimental network architecture.
Futureinternet 18 00012 g001
Figure 2. Exposure of the Facebook Messenger caller’s IP addresses observed in the callee’s traffic during three unanswered call scenarios.
Figure 2. Exposure of the Facebook Messenger caller’s IP addresses observed in the callee’s traffic during three unanswered call scenarios.
Futureinternet 18 00012 g002
Figure 3. Example traffic patterns in a WhatsApp call (Caller: mobile network; Callee: Wi-Fi network). This figure demonstrates the disclosure of both private and public IP addresses for both communication parties, as well as the hybrid connection topology where server-relayed communication and direct peer-to-peer transmission coexist. In this figure, red and green boxes indicate private and public IP addresses, respectively. Purple arrows/boxes represent server-relayed communication, while orange arrows/boxes represent direct traffic transmission.
Figure 3. Example traffic patterns in a WhatsApp call (Caller: mobile network; Callee: Wi-Fi network). This figure demonstrates the disclosure of both private and public IP addresses for both communication parties, as well as the hybrid connection topology where server-relayed communication and direct peer-to-peer transmission coexist. In this figure, red and green boxes indicate private and public IP addresses, respectively. Purple arrows/boxes represent server-relayed communication, while orange arrows/boxes represent direct traffic transmission.
Futureinternet 18 00012 g003
Figure 4. Example of public and private IP address disclosure (highlighted by the red box) within Facebook Messenger call traffic during server-relayed communication.
Figure 4. Example of public and private IP address disclosure (highlighted by the red box) within Facebook Messenger call traffic during server-relayed communication.
Futureinternet 18 00012 g004
Figure 5. Traffic characteristics in Snapchat under different transmission modes. (a) Direct traffic transmission between communicating parties, revealing the public IP addresses of both parties. In this figure, the blue and green boxes indicate the public IP addresses of the callee and the caller, respectively. The dark yellow arrows and boxes represent direct traffic transmission. (b) Server-relayed communication in which packets during the call phase still leak public IP addresses. Similarly, the blue and green boxes indicate the public IP addresses of the callee and the caller, respectively. The purple arrows and boxes represent server-relayed communication.
Figure 5. Traffic characteristics in Snapchat under different transmission modes. (a) Direct traffic transmission between communicating parties, revealing the public IP addresses of both parties. In this figure, the blue and green boxes indicate the public IP addresses of the callee and the caller, respectively. The dark yellow arrows and boxes represent direct traffic transmission. (b) Server-relayed communication in which packets during the call phase still leak public IP addresses. Similarly, the blue and green boxes indicate the public IP addresses of the callee and the caller, respectively. The purple arrows and boxes represent server-relayed communication.
Futureinternet 18 00012 g005aFutureinternet 18 00012 g005b
Figure 6. Comparison of server hostnames and session durations observed in Facebook Messenger call sessions. The cyan box highlights server-relayed traffic, while the dark yellow box represents direct traffic transmission.
Figure 6. Comparison of server hostnames and session durations observed in Facebook Messenger call sessions. The cyan box highlights server-relayed traffic, while the dark yellow box represents direct traffic transmission.
Futureinternet 18 00012 g006
Figure 7. Comparison of traffic behavior between voice and video calls in WhatsApp under the system-timeout termination scenario. (a) Voice call; (b) Video call. The blue boxes highlight the specific packet counts observed, demonstrating the difference in traffic volume between the two modes.
Figure 7. Comparison of traffic behavior between voice and video calls in WhatsApp under the system-timeout termination scenario. (a) Voice call; (b) Video call. The blue boxes highlight the specific packet counts observed, demonstrating the difference in traffic volume between the two modes.
Futureinternet 18 00012 g007
Figure 8. Comparison of overall traffic volumes between voice and video calls under conversation-termination scenarios at near time. (a) WhatsApp, (b) Facebook Messenger, and (c) Snapchat. In this figure, the cyan boxes highlight the specific traffic entries representing the voice and video call sessions, used for comparing their overall traffic volumes.
Figure 8. Comparison of overall traffic volumes between voice and video calls under conversation-termination scenarios at near time. (a) WhatsApp, (b) Facebook Messenger, and (c) Snapchat. In this figure, the cyan boxes highlight the specific traffic entries representing the voice and video call sessions, used for comparing their overall traffic volumes.
Futureinternet 18 00012 g008aFutureinternet 18 00012 g008b
Figure 9. Example of identical STUN Binding Request User IDs observed between the caller and callee in Facebook Messenger. The blue boxes highlight specific byte sequences in the hex dump and their corresponding ASCII characters on the right.
Figure 9. Example of identical STUN Binding Request User IDs observed between the caller and callee in Facebook Messenger. The blue boxes highlight specific byte sequences in the hex dump and their corresponding ASCII characters on the right.
Futureinternet 18 00012 g009
Figure 10. Example of identical STUN binding request user IDs (highlighted in blue) observed between the caller and callee in Snapchat.
Figure 10. Example of identical STUN binding request user IDs (highlighted in blue) observed between the caller and callee in Snapchat.
Futureinternet 18 00012 g010
Table 1. Hardware and software configurations of the experimental setup.
Table 1. Hardware and software configurations of the experimental setup.
Experimental
Devices/Tools
DescriptionSpecification/Versions
Mobile Device 1Generate IM network traffic and capture from the mobile network directlyGoogle Pixel 6
Android 16
Magisk v29.0 (for root access)
Android tcpdump 4.99.5
Network Analyzer 4.1
Connectivity: Commercial LTE Network (Operator A)
Mobile Device 2Generate IM network traffic Google Pixel 6 Pro
Android 16
Network Analyzer 4.1
Laptop 1Control Android tcpdump on Mobile Device 1 and analyze the captured traffic Dell Inspiron 16 Plus 7610
Microsoft Windows 11 Pro
Android Debug Bridge 1.0.41
Wireshark 4.4.6
Laptop 2Provide Wi-Fi hotspot to Mobile Device 2 and capture traffic; analyze captured traffic HP ZBook Power G10
Microsoft Windows 11 Pro
Wireshark 4.4.6
Home RouterProvide Internet connectivity for Laptop 2TP-Link Deco X10
Connectivity: Fixed Broadband Network (Operator B)
Table 2. IM applications with version information and supported call services.
Table 2. IM applications with version information and supported call services.
ApplicationVersionSupported Call Services
WhatsApp2.25.26.74Voice, Video
Facebook Messenger526.0.0.52.108Voice, Video
Snapchat13.60.0.57Voice, Video
Table 3. Endpoint identifiability across five call scenarios in three IM applications.
Table 3. Endpoint identifiability across five call scenarios in three IM applications.
ScenariosWhatsAppFacebook MessengerSnapchat
Caller-initiated cancellationNo disclosure (12/12)Caller’s public and private IP is visible in callee’s traffic
(12/12).
No disclosure (12/12)
Callee-initiated rejectionNo disclosure (12/12)Caller’s public and private IP is visible in callee’s traffic
(12/12).
No disclosure (12/12)
System timeout terminationNo disclosure (12/12)Caller’s public and private IP is visible in callee’s traffic
(12/12).
No disclosure (12/12)
Caller-terminated
conversation
  • Public and private IPs of both parties are only disclosed in the mobile network caller (6/6).
  • Routing behavior:
    Mobile Caller: P2P preferred (6/6).
    Wi-Fi Caller: Server-relay preferred (6/6).
  • Public and private IPs of both parties disclosed (12/12).
  • Routing behavior:
    Mobile Caller: P2P preferred (4/6); Server-relay observed (2/6).
    Wi-Fi Caller: Server-relay preferred (6/6).
  • Public IPs of both parties disclosed. (12/12)
  • Routing behavior:
    Mobile Caller: P2P observed (1/6); Server-relay observed (5/6).
    Wi-Fi Caller: P2P observed (3/6); Server-relay observed (3/6).
Callee-terminated
conversation
  • Public and private IPs of both parties are only disclosed in the mobile network caller (6/6).
  • Routing behavior:
    Mobile Caller: P2P preferred (6/6).
    Wi-Fi Caller: Server-relay preferred (6/6).
  • Public and private IPs of both parties disclosed (12/12).
  • Routing behavior:
    Mobile Caller: P2P preferred (5/6); Server-relay observed (1/6).
    Wi-Fi Caller: Server-relay preferred (6/6).
  • Public IPs of both parties disclosed. (12/12)
  • Routing behavior:
    Mobile Caller: Server-relay observed (6/6).
    Wi-Fi Caller: P2P observed (3/6); Server-relay observed (3/6).
Note: The values in parentheses represent the ratio of successful occurrences to the total number of trials conducted for that specific scenario (n/N). The total number of trials is 12 for Scenarios 1-3, and 6 for the network-specific sub-scenarios (Mobile Caller versus Wi-Fi Caller) within Scenarios 4 and 5 (each scenario includes both voice and video trials).
Table 4. Connectivity servers and ports for IM application call services.
Table 4. Connectivity servers and ports for IM application call services.
ApplicationObserved Server AddressesObserved Ports
WhatsApp31.13.87.503478
157.240.209.62
31.13.82.48
Facebook Messenger31.13.87.23478, 40003
31.13.87.54
31.13.87.128
157.240.31.57
157.240.209.57
Snapchat13.200.139.250443, 3478
35.190.43.134
35.244.195.33
Table 5. Mapping of key observed server hostnames to IP addresses in Facebook Messenger call sessions.
Table 5. Mapping of key observed server hostnames to IP addresses in Facebook Messenger call sessions.
Host NameIP Address
edge-turnservices6-shv-01-tpe1.facebook.com2a03:2880:f217:c0:face:b00c:0:553e
edge-turnservices-shv-01-tpe1.facebook.com31.13.87.54
edgeray-msgr6-shv-01-tpe1.facebook.com2a03:2880:f217:ce:face:b00c:0:74fd
edgeray-msgr-shv-01-tpe1.facebook.com31.13.87.128
edgeray-msgr6-shv-01-itm1.facebook.com2a03:2880:f24e:cd:face:b00c:0:74fd
edgeray-msgr-shv-01-itm1.facebook.com157.240.209.57
edgeray-msgr6-shv-01-nrt1.facebook.com2a03:2880:f20f:1ce:face:b00c:0:74fd
edgeray-msgr-shv-01-nrt1.facebook.com157.240.31.57
Table 6. Statistical summary of call duration differences across different IM applications. The data is presented as the median (IQR) on the first line, followed by the [Minimum, Maximum] range on the second line.
Table 6. Statistical summary of call duration differences across different IM applications. The data is presented as the median (IQR) on the first line, followed by the [Minimum, Maximum] range on the second line.
ApplicationCaller DeviceVoice CallVideo Call
Absolute (s)NormalizedAbsolute (s)Normalized
WhatsAppDevice 10.894 (1.735)
[0.068, 2.519]
0.020 (0.014)
[0, 0.109]
1.284 (1.479)
[0.0588, 2.548]
0.021 (0.021)
[0.001, 0.143]
Device 21.242 (0.502)
[0.294, 1.941]
0.037 (0.068)
[0.003, 0.273]
1.663 (0.922)
[0.725, 2.743]
0.031 (0.049)
[0.008, 0.288]
Facebook
Messenger
Device 11.155 (0.869)
[0.038, 2.510]
0.021 (0.043)
[0.004, 0.220]
1.249 (0.841)
[0.388, 2.439]
0.017 (0.097)
[0.006, 0.238]
Device 22.493 (3.578)
[0.868, 11.492]
0.101 (0.096)
[0.022, 0.152]
2.030 (5.038)
[0.688, 12.149]
0.058 (0.102)
[0.012, 0.219]
SnapchatDevice 11.352 (1.01)
[0.898, 2.384]
0.025 (0.039)
[0.008, 0.453]
1.309 (0.693)
[0.089, 3.235]
0.029 (0.038)
[0.001, 0.084]
Device 21.184 (0.647)
[0.060, 2.468]
0.024 (0.054)
[0.001, 0.195]
1.096 (1.243)
[0.229, 2.254]
0.019 (0.045)
[0.002, 0.161]
Table 7. Statistical summary of traffic volume (in packets) differences for established conversations (Scenarios 4 and 5). The data is presented as the median (IQR) on the first line, followed by the range [Min, Max] on the second line.
Table 7. Statistical summary of traffic volume (in packets) differences for established conversations (Scenarios 4 and 5). The data is presented as the median (IQR) on the first line, followed by the range [Min, Max] on the second line.
ApplicationCaller DeviceVoice CallVideo Call
Absolute (Packets)NormalizedAbsolute (Packets)Normalized
WhatsAppDevice 11 (0.25)
[1, 2]
0.004 (0.037)
[0.001, 0.056]
14 (38)
[3, 44]
0.003 (0.004)
[0.001, 0.006]
Device 212 (6)
[7, 19]
0.014 (0.088)
[0.006, 0.107]
21.5 (53)
[0, 71]
0.012 (0.024)
[0, 0.042]
Facebook
Messenger
Device 11 (17)
[0, 53]
0.015 (0.039)
[0, 0.040]
20 (312)
[1, 621]
0.016 (0.064)
[0, 0.077]
Device 219.5 (13)
[5, 27]
0.008 (0.049)
[0.003, 0.083]
32 (688.5)
[12, 2622]
0.018 (0.143)
[0.003, 0.181]
SnapchatDevice 110 (27.75)
[3, 33]
0.003 (0.013)
[0.001, 0.043]
1021 (1567.75)
[42, 2329]
0.067 (0.040)
[0.041, 0.124]
Device 25 (7.75)
[1, 11]
0.003 (0.002)
[0.002, 0.005]
1405 (1856)
[16, 2276]
0.073 (0.065)
[0, 0.130]
Table 8. Statistical summary of traffic volume (in KB) differences for established conversations (Scenarios 4 and 5). The data is presented as the median (IQR) on the first line, followed by the range [Min, Max] on the second line.
Table 8. Statistical summary of traffic volume (in KB) differences for established conversations (Scenarios 4 and 5). The data is presented as the median (IQR) on the first line, followed by the range [Min, Max] on the second line.
ApplicationCaller DeviceVoice CallVideo Call
Absolute (KB)NormalizedAbsolute (KB)Normalized
WhatsAppDevice 10.05 (1.25)
[0, 2]
0.003 (0.013)
[0, 0.037]
21 (30.75)
[3, 42]
0.004 (0.005)
[0.001,0.006]
Device 21 (2.25)
[1, 4]
0.020 (0.078)
[0.003, 0.111]
13 (10.25)
[1, 3]
0.0038 (0.013)
[0.003, 0.025]
Facebook
Messenger
Device 11 (4.8)
[0, 20]
0.007 (0.064)
[0, 0.067]
7 (42)
[0, 61]
0.007 (0.042)
[0, 0.222]
Device 27 (6.5)
[1, 9]
0.019 (0.0832)
[0.003, 0.189]
8 (15)
[0, 84]
0.004 (0.159)
[0.002, 0.415]
SnapchatDevice 142 (65.5)
[6, 94]
0.053 (0.013)
[0.028, 0.060]
82 (153)
[2, 223]
0.011 (0.004)
[0.004, 0.014]
Device 23 (6.25)
[1, 17]
0.010 (0.064)
[0.004, 0.070]
34 (124.25)
[4, 182]
0.013 (0.036)
[0.005, 0.097]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, C.-Y. Identity Leakage in Encrypted IM Call Services: An Empirical Study of Metadata Correlation. Future Internet 2026, 18, 12. https://doi.org/10.3390/fi18010012

AMA Style

Li C-Y. Identity Leakage in Encrypted IM Call Services: An Empirical Study of Metadata Correlation. Future Internet. 2026; 18(1):12. https://doi.org/10.3390/fi18010012

Chicago/Turabian Style

Li, Chen-Yu. 2026. "Identity Leakage in Encrypted IM Call Services: An Empirical Study of Metadata Correlation" Future Internet 18, no. 1: 12. https://doi.org/10.3390/fi18010012

APA Style

Li, C.-Y. (2026). Identity Leakage in Encrypted IM Call Services: An Empirical Study of Metadata Correlation. Future Internet, 18(1), 12. https://doi.org/10.3390/fi18010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop