SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine

Hu, Kaiyuan; Ma, Deen; Qiu, Shi

doi:10.3390/electronics14173371

Open AccessArticle

SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine

by

Kaiyuan Hu

¹

,

Deen Ma

²

and

Shi Qiu

^3,*

¹

Department of Computer Science, McGill University, Montreal, QC H3A 0G4, Canada

²

College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China

³

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(17), 3371; https://doi.org/10.3390/electronics14173371

Submission received: 30 June 2025 / Revised: 7 August 2025 / Accepted: 12 August 2025 / Published: 25 August 2025

(This article belongs to the Special Issue Big Data Security and Privacy)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Volumetric video streaming holds transformative potential for telemedicine, enabling immersive remote consultations, surgical training, and real-time collaborative diagnostics. However, transmitting sensitive patient data (e.g., 3D medical scans, surgeon head/gaze movements) raises critical privacy risks, including exposure of biometric identifiers and protected health information (PHI). To address the above concerns, we propose SecureTeleMed, a dual-track encryption scheme tailored for volumetric video based telemedicine. SecureTeleMed combines viewport obfuscation and region of interest (ROI)-aware frame encryption to protect both patient data and clinician interactions while complying with healthcare privacy regulations (e.g., HIPAA, GDPR). Evaluations show SecureTeleMed reduces privacy leakage by 89% compared to baseline encryption methods, with sub-50 ms latency suitable for real-time telemedicine applications.

Keywords:

telemedicine security; medical data privacy; 3D medical visualization; encryption; security analysis

1. Introduction

Volumetric video [1] has emerged as a transformative technology for creating immersive, interactive experiences in applications such as virtual reality, telemedicine, and remote education. Unlike traditional 2D video, which represents flat images, volumetric video captures 3D data, often in the form of point clouds or textured meshes, allowing users six degrees of freedom (6-DoF) to explore scenes from different perspectives. This capability is particularly promising for telemedicine, where volumetric video can enable remote surgical consultations, collaborative diagnostics, and immersive medical training. However, the substantial data size and real-time processing demands associated with volumetric video pose significant technical challenges, particularly in bandwidth consumption and latency tolerance. These limitations make efficient and responsive streaming a critical task, especially for time-sensitive medical applications where low latency is essential for seamless interactions.

Volumetric video [1] has emerged as a transformative technology for creating immersive, interactive experiences in applications such as virtual reality, telemedicine, and remote education [2]. Unlike traditional 2D video, which represents flat images, volumetric video captures 3D data, often in the form of point clouds or textured meshes, allowing users six degrees of freedom (6-DoF) to explore scenes from different perspectives. This capability is particularly promising for telemedicine, enabling applications like remote surgical consultations, collaborative diagnostics, and immersive medical training. For instance, in remote surgery scenarios, volumetric video can transmit the surgeon’s precise hand movements and the patient’s real-time 3D anatomy. However, this rich data stream simultaneously poses a significant risk, as it could expose critical procedural details or patient biometrics if intercepted.

Despite its immense potential, the widespread adoption of volumetric video in telemedicine faces critical technical hurdles: substantial data size and stringent real-time processing demands. These factors lead to significant challenges in bandwidth consumption and latency tolerance. Efficient and responsive streaming is crucial, especially for time-sensitive medical applications where low latency (e.g., sub-50 ms Motion-to-Photon latency) is essential for seamless and safe interactions.

More critically, the detailed nature of volumetric data introduces severe privacy risks that directly conflict with stringent healthcare regulations like HIPAA and GDPR. The transmitted data streams can expose a wealth of sensitive information. For patients, volumetric data often contains detailed biometric identifiers, such as facial contours and gait patterns, which could be exploited for unauthorized user identification [3]. Moreover, the transmission of raw or processed 3D medical data, such as DICOM files, risks leaking sensitive content, including patient identifiers or specific anatomical details [4]. From the clinician’s perspective, head and gaze tracking data can inadvertently reveal psychological states, decision-making processes, or procedural expertise. Studies have shown that such tracking data can be linked to medical conditions like autism [5], PTSD [6], and even be used in diagnosing dementia [7,8]. In a medical context, this could expose professional vulnerabilities. These multifaceted privacy risks highlight the urgent need for a robust, specialized privacy-preserving solution that can safeguard data for both patients and clinicians without compromising the real-time performance essential for advanced telemedicine applications.

To address these intertwined challenges of privacy, efficiency, and real-time performance, we propose SecureTeleMed, a dual-track encryption scheme specifically tailored for volumetric video-based telemedicine. SecureTeleMed employs two key privacy-preserving mechanisms: viewport obfuscation and frame-wise encryption with region of interest (ROI) mapping. Viewport obfuscation selectively encrypts high-priority segments in the viewport data stream based on a calculated Prediction Contribution Value (PCV), which identifies segments that are both critical for predicting future user behavior and highly sensitive from a privacy perspective. This targeted approach protects sensitive data, such as clinician head movements or patient biometrics, while maintaining the ultra-low latency required for real-time medical interactions. Additionally, SecureTeleMed uses frame-wise encryption with ROI mapping, where user interest is projected from the 3D space onto the 2D frame, assigning an interest score to each tile. High-interest areas, such as anatomical regions in 3D medical scans, receive stronger encryption, ensuring robust content privacy without overburdening the system’s computational resources.

By integrating these complementary techniques, SecureTeleMed offers a comprehensive solution for securing both user behavioral data (viewport) and rendered content (frames) in real-time telemedicine services. This adaptive approach not only meets stringent privacy requirements but also supports efficient streaming performance under tight latency constraints, making it particularly suitable for high-stakes medical applications. The contributions of this work include the following:

Dual-Track Encryption Scheme: A novel scheme that protects both viewport trajectory data and rendered 2D frames, comprehensively addressing the distinct privacy challenges inherent in volumetric video streaming for telemedicine.
ROI-Guided Selective Encryption: A dynamic mechanism that assigns encryption levels based on real-time user focus (ROI), effectively balancing robust privacy protection with the computational efficiency demanded by real-time medical interactions.
Adaptive Frame-Wise Encryption: An encryption strategy where strength is varied according to each tile’s calculated interest score, optimizing data protection while maintaining the real-time performance necessary for medical data streams.

The remainder of this paper is structured as follows: Section 2 reviews related work in volumetric video, telemedicine, and privacy preservation techniques. Section 3 details the methodology of SecureTeleMed, including viewport obfuscation and frame-wise encryption mechanisms. Section 4 presents experimental results evaluating encryption efficiency and privacy-preserving performance. Section 5 discusses limitations and future work, and Section 6 concludes the paper.

2. Related Work

In this section, we review the state-of-the-art research across three key domains relevant to SecureTeleMed: (1) volumetric video, (2) telemedicine applications, and (3) data privacy preservation techniques. These areas collectively inform the design and motivation behind SecureTeleMed.

2.1. Volumetric Video

Volumetric video captures scenes in three dimensions using multi-view cameras, depth sensors, or LiDAR, enabling immersive playback or even live streaming with six degrees of freedom (6-DoF) [1]. Unlike traditional 2D or panoramic video, volumetric media allows users to freely navigate within a scene, making it ideal for applications like virtual reality (VR), augmented reality (AR), and remote collaboration.

Recent advances in compression standards such as MPEG’s Point Cloud Compression (PCC) [9] and mesh-based representations have enabled more efficient storage and transmission of volumetric data. Additionally, viewport-adaptive streaming techniques have been developed to reduce bandwidth consumption by prioritizing high-resolution rendering of regions likely to be viewed by the user [10]. Tile-based encoding and ROI-aware delivery are also gaining traction as effective strategies for optimizing quality of experience (QoE) under limited network conditions.

However, despite progress in efficiency and interactivity, volumetric video introduces new challenges related to security and privacy due to its rich representation of spatial and behavioral data. This makes it particularly sensitive when applied in healthcare settings, where both patient and clinician identities and actions must be protected.

2.2. Telemedicine

Telemedicine has rapidly evolved into a critical component of modern healthcare delivery, especially in light of global health crises and increasing demand for remote diagnostics and surgical training [11]. It enables real-time consultations, collaborative diagnosis, and expert guidance during procedures without requiring physical presence [12]. With advancements in networking, cloud computing, and medical imaging technologies, telemedicine systems now support richer modalities, including high-resolution 2D video, 3D medical scans, and, more recently, immersive volumetric video.

Several studies have explored the integration of VR/AR into clinical workflows. For instance, ref. [13] investigates the use of immersive environments for mental health therapy, while [14] examines anonymization methods for facial 3D scans used in remote consultations. However, most existing telemedicine platforms focus on functional accuracy and system performance rather than end-to-end privacy and security, especially when handling advanced media formats like volumetric video.

Moreover, regulatory frameworks such as HIPAA [15] impose strict requirements on the protection of personal health information (PHI), mandating robust mechanisms for securing both content and behavioral data. As volumetric video becomes increasingly integrated into telemedicine pipelines, ensuring compliance with these regulations while maintaining interactive performance remains a major challenge.

2.3. Privacy Concern in Volumetric Video

Volumetric streaming often requires the collection of detailed user data, including viewport information for adaptive streaming and interaction data in VR/AR settings. Sharing this data with a server introduces privacy risks, as it could expose personal viewing habits, preferences, and behavioral patterns if accessed by unauthorized parties. In clinical contexts, head and gaze tracking data may inadvertently reveal a clinician’s cognitive load, attention distribution, or even latent neurological conditions [5,6], posing additional ethical and professional risks.

Transmitting raw or processed 3D data for rendering may reveal sensitive content if intercepted or stored insecurely, potentially exposing identifiable elements within the streamed environment. For example, volumetric reconstructions of patients’ faces or anatomical structures may contain unique biometric identifiers that violate privacy norms and legal mandates [4]. Techniques such as blurring or pixelation—commonly used in 2D video—are insufficient in 3D environments due to viewpoint freedom and spatial reconstruction capabilities.

These multifaceted threats highlight the urgent need for holistic privacy-preserving mechanisms tailored to telemedicine applications. Our work addresses these gaps through a dual-track encryption framework that protects both rendered content and behavioral metadata in real time.

3. Method

This section details the technical framework of SecureTeleMed, a dual-track encryption scheme designed to address the unique privacy and real-time challenges of volumetric video in telemedicine. It first outlines the overall workflow integrating viewport obfuscation and frame-wise encryption with ROI mapping, then elaborates on each core mechanism: viewport obfuscation (which uses Prediction Contribution Value to selectively encrypt sensitive segments) and frame-wise encryption (which dynamically assigns encryption levels based on user perception sensitivity scores). These techniques are designed to balance robust privacy protection with the ultra-low latency required for real-time medical interactions.

3.1. Overview

Building on existing privacy-preserving solutions for video streaming, we propose SecureTeleMed, a system designed to secure both user behavior and video transmission in real-time telemedicine services. SecureTeleMed addresses the unique privacy challenges of telemedicine applications, such as remote surgery and collaborative diagnostics, by safeguarding sensitive information throughout the streaming process. As illustrated in Figure 1, SecureTeleMed integrates two key mechanisms: viewport obfuscation and frame-wise encryption with region of interest (ROI) mapping. These mechanisms ensure robust privacy protection while maintaining the low latency requirement for real-time medical interactions.

Viewport Obfuscation: In SecureTeleMed, we employ a cloud rendering architecture to transform the user’s 3D scene perception into 2D frames based on the uploaded viewport trajectory. To ensure a smooth viewing experience, the Motion-to-Photon (MTP) latency must be kept below 50 ms [16], leaving limited room for implementing protection schemes. To address this, we introduce viewport obfuscation, which selectively encrypts high-priority segments of the viewport data stream. This approach protects sensitive user data, such as clinician head and gaze movements, without compromising real-time performance.

Frame-wise Encryption: We devise frame-wise encryption to protect sensitive regions of rendered 2D frames. Using a User Region of Interest Occupancy Encryption approach, our SecureTeleMed scheme dynamically identifies and encrypts high-interest areas based on user attention scores. This avoids over-encryption, reduces computational overhead, and ensures low latency, which is critical for balancing real-time performance and data security in telemedicine.

3.2. Viewport Obfuscation

Volumetric video streaming in telemedicine demands ultra-low latency (under 50 ms) for tasks like remote surgery, limiting encryption overhead to maintain real-time performance. Considering these constraints, it is crucial to prioritize encrypting most critical data for prediction accuracy and user privacy. In practice, certain segments contain richer information in prediction of the user’s future viewport and in ensuring a smooth playback. Given these considerations, we propose the Prediction Contribution Value (PCV), a metric that quantifies each segment’s importance by combining data variation frequency and spread variance. Specifically, the

PCV (S_{i})

of segment

S_{i}

is defined as

PCV (S_{i}) = α \cdot Frequency (S_{i}) + β \cdot Variance (S_{i}),

(1)

Frequency (S_{i}) = \frac{\sum_{j = 1}^{n - 1} I (| S_{i} [j + 1] - S_{i} [j] | > Δ)}{n - 1},

(2)

Variance (S_{i}) = \frac{1}{n} \sum_{j = 1}^{n} {(S_{i} [j] - μ_{S_{i}})}^{2};

(3)

where

α

and

β

are weights with

α + β = 1

,

Frequency (S_{i})

represents the frequency of significant changes in segment

S_{i}

,

Variance (S_{i})

is the variance of the segment, and

μ_{S_{i}} = \frac{1}{n} \sum_{j = 1}^{n} S_{i} [j]

is the mean of segment

S_{i}

.

To further address the need for selective encryption while adhering to stringent latency requirements, we introduce viewport obfuscation, a sophisticated mechanism that intelligently segments the user’s viewport data stream and dynamically applies stronger encryption only to segments identified as having a higher Prediction Contribution Value (PCV). This targeted approach achieves an optimal balance between robust privacy protection and computational efficiency, ensuring that sensitive data—such as detailed clinician head movements, gaze patterns, or patient biometric features—is adequately secured without exceeding the critical latency constraints necessary for real-time telemedicine interactions (e.g., sub-50 ms Motion-to-Photon latency for remote surgery). Instead of employing a brute-force method that encrypts the entire viewport stream, which would introduce significant computational overhead and jeopardize real-time performance, we propose a dimension-based segmentation strategy. This strategy involves independently analyzing each data dimension (e.g., X-axis position, yaw angle, pitch angle) to precisely identify and selectively encrypt “hot segments”—specific temporal or spatial regions within the viewport trajectory that are simultaneously critical for accurate viewport prediction and highly sensitive from a privacy perspective. By focusing encryption efforts on these high-PCV “hot segments,” our approach minimizes overall encryption latency and computational load, while maximizing the protection of the most vulnerable and informative data elements within the user’s viewport stream. In practice, we develop the following key techniques to form our comprehensive viewport obfuscation mechanism.

Dimension-Based Segmentation: Each data dimension (e.g., X-axis position, yaw angle, pitch angle) is segmented and analyzed independently, allowing fine-grained identification of regions critical for accurate predictions and privacy, as different dimensions vary in frequency and importance. This per-dimension analysis ensures that encryption decisions are tailored to the unique behavioral patterns in each axis of movement, avoiding over-encryption of stable or redundant signals.

High-Priority Segment Detection: High-frequency and high-variance segments are prioritized, as frequency tracks dynamic changes—such as rapid head turns or shifts in gaze direction—and variance captures wide behavioral ranges that may reveal sensitive contextual information (e.g., reaction to patient condition). These high-PCV segments are flagged as “hot segments” and marked for enhanced protection, since they simultaneously contribute most to viewport prediction accuracy and pose the greatest privacy risk if exposed.

Selective Encryption for Privacy and Prediction Efficiency: We employ a tiered encryption strategy based on the Prediction Contribution Value (

PCV (S_{i})

). Let

F_{t} = PCV (S_{i})

and define two thresholds:

T_{M}

(medium) and

T_{H}

(high). Segments are classified and encrypted as follows:

-: High Encryption: $F_{t} \geq T_{H}$ → AES-256, 256-bit key, 14 rounds. Protects highly sensitive, predictive segments (e.g., rapid gaze shifts).
-: Medium Encryption: $T_{M} \leq F_{t} < T_{H}$ → AES-192, 192-bit key, 12 rounds. Balances security and efficiency for moderately dynamic data.
-: Low Encryption: $F_{t} < T_{M}$ → AES-128, 128-bit key, 10 rounds. Minimizes overhead for stable, low-risk segments.

This adaptive scheme focuses encryption on high-PCV “hot segments,” ensuring strong privacy for critical data while keeping end-to-end latency below 10 ms—meeting real-time demands of telemedicine applications like remote surgery.

3.3. Frame-Wise Encryption

Given the high bandwidth demands and privacy concerns in volumetric video streaming, selectively encrypting parts of each frame balances privacy protection with computational efficiency. Complementing viewport obfuscation, SecureTeleMed integrates frame-wise encryption, which secures only the sensitive parts of each rendered 2D frame during the transmission process. This approach enhances privacy by encrypting areas deemed sensitive based on the user’s viewport projection and relevant content changes, ensuring efficient protection without unnecessary processing.

User Perception Sensitivity Assessment: Building on prior work [17] on user perception preferences in volumetric video, we extend the ROI formula for 2D rendering to evaluate sensitivity distribution in 3D content perception. To define the User Perception Sensitivity for each tile in the rendered 2D frame, we extend the original formula in [17], lifting it for 3D scenes to work with the projected tiles. The sensitivity score

F_{t}

for each 2D tile t, is calculated as

F_{t} = \frac{ρ_{t} \cdot f_{g t}}{D_{t}}

(4)

where

ρ_{t}

represents the projected density of points from the 3D cube onto the 2D tile t, similar to the point density in the 3D cube;

D_{t}

indicates the distance from the user’s headset to the center of the 3D cube corresponding to the tile t; and

f_{g t}

denotes the frequency of the tile falling within the user’s gaze frustum. To enable the assessment of the user’s sensitivity distribution of each rendered frame, we use the following formula to estimate the frequency of interest for each tile t:

f_{g t} = \frac{\sum_{i = 1}^{N} N_{g t} (i)}{N_{sample}}

(5)

where

N_{g t} (i)

is the count of times the user’s gaze aligns with tile t during sample i, and

N_{sample}

is the total number of samples of user behavior.

Viewport Projection with Region of Interest (ROI) Mapping:

We use gaze and head orientation to map 3D areas of interest onto the 2D frame. Based on Formula (4), each user interest score defines a 2D ROI, prioritizing tiles with higher scores for encryption to protect sensitive regions. After assigning interest scores to each tile in the 2D frame based on its 3D relevance, our SecureTeleMed applies encryption levels tailored to each area’s sensitivity. As shown in Figure 2, each frame is divided into 8 × 8 tiles, with interest scores assigned for sensitivity assessment. High-sensitivity tiles are then selected for encryption, realizing selective tile encryption.

To achieve a balance between encryption speed and security, a hybrid encryption scheme employing AES [18] and RSA [19] algorithms is utilized, as demonstrated in Figure 3. AES, an efficient symmetric encryption algorithm, encrypts data using a shared key K, with the sender encrypting the message and the receiver decrypting it using the same key. Due to its high-speed encryption capabilities, AES is particularly advantageous for real-time video transmission. Our implementation utilizes AES in Galois/Counter Mode (GCM), which provides both confidentiality and integrity protection through authenticated encryption. GCM mode is particularly suitable for real-time volumetric video streaming where data authenticity is as critical as privacy, as it detects any tampering with the ciphertext. The GCM mode’s ability to perform encryption and authentication in parallel further enhances its efficiency for our low-latency telemedicine applications, ensuring that the cryptographic overhead remains minimal while providing robust security guarantees. To address the security concerns associated with direct key transmission, RSA, an asymmetric encryption algorithm, is incorporated. The sender encrypts the AES key K using the receiver’s public RSA key, generating a ciphertext. The receiver then decrypts this ciphertext using their private RSA key, retrieving the AES key K. Subsequently, the receiver decrypts the AES-encrypted message using K, thus obtaining the original message. This hybrid approach effectively integrates the high-speed data encryption of AES with the secure key exchange of RSA, resulting in efficient and secure communication.

Dynamic Encryption Allocation: We propose the Dynamic Encryption Allocation algorithm to optimize privacy and efficiency by assigning encryption levels based on each tile’s user perception sensitivity score

F_{t}

. In particular, this approach ensures tiles with higher sensitivities receive stronger encryption, while less critical areas use lighter encryption, balancing privacy and real-time performance. The algorithm follows these specific steps:

Sensitivity-Based Encryption Assignment: Each tile in the frame is assigned a user perception sensitivity score $F_{t}$ , demonstrating its importance based on user perception. The algorithm then categorizes these tiles into three encryption levels by comparing each interest score with predefined thresholds $T_{H}$ and $T_{M}$ .
- High Encryption is applied to tiles with $F_{t} \geq T_{H}$ , using a key length of 256 bits and 14 encryption rounds, ensuring robust security for the most sensitive data.
- Medium Encryption is applied to tiles where $T_{M} \leq F_{t} < T_{H}$ , using a key length of 192 bits and 12 rounds, providing a balance of security and efficiency.
- Low Encryption is assigned to tiles with $F_{t} < T_{M}$ , using a key length of 128 bits and 10 rounds, minimizing computations for less critical data.
Resource Constraints Adjustment: The algorithm evaluates the encryption plan’s cost against system constraints. If exceeded, thresholds $T_{H}$ and $T_{M}$ are adjusted to reduce encryption load, ensuring resource limits are met while prioritizing sensitive areas.
Efficiency and Adaptability: The selective encryption algorithm ensures real-time performance by focusing stronger encryption on high-interest areas, reducing computational load while maintaining privacy and efficiency.

4. Experiment

This section evaluates the performance of SecureTeleMed through systematic experiments, focusing on two key metrics: encryption/decryption efficiency (measured by processing time) and privacy-preserving capability (quantified by Privacy Leakage Rate). It describes the experimental setup, presents results from comparative analyses of default vs. optimized encryption configurations, and discusses how the findings demonstrate SecureTeleMed’s ability to reduce privacy leakage by 89% while maintaining sub-50 ms latency—validating its suitability for real-time telemedicine applications.

4.1. Experiment Setup

We evaluate the performance of SecureTeleMed in two major aspects: encryption/decryption efficiency and privacy-preserving performance. The evaluation metrics are defined as follows:

Encryption/Decryption Time: We measure the time that applies and reverses encryption for viewport obfuscation and frame-wise encryption, testing the scheme’s computational efficiency and real-time performance impact.
Privacy Leakage Rate ( $P L R$ ): We calculate the proportion of sensitive tiles left unencrypted or insufficiently encrypted in each frame, due to reasons including inappropriately defined thresholds, insufficient sensitivity detection, etc. A lower leakage rate indicates stronger privacy protection. The calculation of $P L R$ is

$P L R = \frac{Unencrypted Sensitive Tiles}{Total Sensitive Tiles}$

(6)

4.2. Experimental Assessment

We illustrate the encryption/decryption times in Figure 4 and Figure 5. In the default configuration (Figure 4), encryption and decryption times increase with higher key lengths, especially for 256-bit keys. The optimized configuration (Figure 5), which applies selective encryption to sensitive tiles, significantly reduces processing times across all key lengths. This approach maintains privacy protection while minimizing latency, demonstrating the effectiveness of selective encryption for real-time volumetric video streaming.

As for the privacy-preserving performance, we assess the Privacy Leakage Rate using a test video pack from the volumetric video dataset FSVVD [20] that lasts 360 s in total. The average Privacy Leakage Rate (PLR) for various activities, including ‘Presenting,’ ‘Answering Questions,’ ‘Drinking,’ and ‘Interview,’ is shown in Figure 6. Across all activities, the PLR remains below 10%, indicating effective privacy protection by SecureTeleMed. Slight variations in PLR among activities reflect different movement dynamics—for instance, more frequent head movements during ‘Answering Questions’ lead to temporarily higher sensitivity region coverage. As the experiment duration increases, user perception sensitivity scores are detected more accurately through adaptive optimization, and encryption parameters are dynamically optimized based on accumulated behavioral patterns. This refinement reduces the likelihood of missing sensitive tiles during selective encryption, thereby minimizing unnecessary exposure and contributing to a progressively lower PLR over time. The stabilization of PLR with prolonged observation further demonstrates the system’s ability to learn and adapt to individual user behavior. Overall, these results highlight SecureTeleMed’s effectiveness in maintaining low privacy leakage while supporting real-time, adaptive protection in volumetric telemedicine streaming.

5. Discussion

This section analyzes the limitations of SecureTeleMed and outlines future research directions. It first identifies key constraints, including challenges in user interest mapping precision (e.g., noisy input data, generalized sensitivity models) and latency risks introduced by encryption overhead. It then proposes actionable improvements, such as enhanced noise filtering, personalized sensitivity modeling, and hardware-accelerated encryption, to strengthen both performance and privacy protection for broader telemedicine applicability.

5.1. Shortcomings and Limitations

5.1.1. Precision of User Interest Mapping

Accurate user interest mapping is crucial for achieving effective and efficient ROI-aware encryption. However, several factors can impact its precision:

Noisy Input Data: Gaze and head movement data often contain noise due to sensor inaccuracies or natural variability in user behavior, which can lead to errors in calculating interest scores.

Generalized Sensitivity Model: The current sensitivity model assumes uniform privacy preferences across users, which may not reflect individual differences or context-specific privacy needs.

Dynamic User Behavior: Rapid shifts in visual attention or head orientation may occur faster than the system can update encryption priorities, potentially resulting in misaligned encryption boundaries and reduced privacy protection.

5.1.2. Latency Constraints

Real-time volumetric video streaming requires ultra-low latency—typically under 50 ms—to ensure a seamless and immersive experience. However, integrating dynamic encryption introduces additional challenges:

Encryption Overhead: Calculating real-time interest scores and determining adaptive encryption levels increases computational load, which can introduce delays in frame processing.

Frame Complexity: Frames with high spatial or temporal complexity may require encrypting a larger number of tiles at higher priority, increasing processing time and risking latency violations.

To address these limitations, future work could explore enhanced noise filtering techniques, adaptive sensitivity modeling, and hardware-accelerated encryption methods to maintain both performance and privacy protection.

5.2. Future Work

5.2.1. Adaptive User Sensitivity Assessment

A promising direction for future development is the creation of personalized sensitivity models that account for individual user preferences and contextual requirements. By incorporating machine learning techniques or user feedback mechanisms, the system can dynamically adjust encryption priorities based on observed behavior, thereby improving both privacy and user experience. This personalized approach would involve developing adaptive models that consider factors such as individual privacy preferences, role-based requirements (e.g., patient vs. clinician), and context-specific needs (e.g., routine consultation vs. sensitive diagnostic procedure). Such models could leverage behavioral analysis to automatically calibrate sensitivity thresholds, ensuring that each user receives appropriate protection levels tailored to their specific needs and the clinical context. Additionally, implementing configurable privacy settings would allow users to manually adjust their preferred protection levels, providing greater control over their data privacy while maintaining the system’s adaptive capabilities.

5.2.2. Real-World Testing in Diverse Applications

Expanding the evaluation of SecureTeleMed beyond controlled environments to real-world applications—such as remote surgical consultations, virtual reality-based therapy, and collaborative diagnostics—can provide valuable insights into system performance under varying conditions. These use cases may reveal distinct patterns in user interaction and privacy expectations, enabling further refinement of interest score thresholds and encryption policies to enhance robustness and applicability across domains.

6. Conclusions

In this work, we introduced SecureTeleMed, a comprehensive privacy-preserving volumetric video streaming system designed to tackle the dual challenges of privacy protection and computational efficiency in real-time streaming scenarios. The system employs a dual-track encryption mechanism that integrates viewport obfuscation, which selectively secures user trajectory data to protect behavioral patterns, with prediction-based selective encryption, optimizing encryption intensity based on the importance of viewport segments. Additionally, SecureTeleMed incorporates frame-wise encryption guided by user perception sensitivity, ensuring that only high-interest regions of the rendered 2D frame are encrypted at higher levels. This targeted approach minimizes unnecessary computational overhead, preserves bandwidth, and enhances system responsiveness.

Author Contributions

Conceptualization, K.H. and S.Q.; methodology, K.H., D.M. and S.Q.; software, K.H. and D.M.; validation, K.H., D.M. and S.Q.; formal analysis, K.H. and D.M.; investigation, K.H. and S.Q.; resources, K.H. and S.Q.; data curation, K.H.; writing—original draft preparation, K.H. and D.M.; writing—review and editing, S.Q.; visualization, K.H. and D.M.; supervision, S.Q.; project administration, K.H.; funding acquisition, S.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Chinese University of Hong Kong grant number 4055212.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the Institute of Medical Intelligence and XR, CUHK for the technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Valenzise, G.; Alain, M.; Zerman, E.; Ozcinar, C. Immersive Video Technologies; Academic Press: Cambridge, MA, USA, 2022. [Google Scholar]
Qiu, S.; Xie, B.; Liu, Q.; Heng, P.A. Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study. In Proceedings of the 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Saint Malo, France, 8–12 March 2025; pp. 1332–1333. [Google Scholar]
Miller, M.R.; Herrera, F.; Jun, H.; Landay, J.A.; Bailenson, J.N. Personal Identifiability of User Tracking Data during Observation of 360-degree VR Video. Sci. Rep. 2020, 10, 17404. [Google Scholar] [CrossRef] [PubMed]
Mustra, M.; Delac, K.; Grgic, M. Overview of the DICOM standard. In Proceedings of the 2008 50th International Symposium ELMAR, Zadar, Croatia, 10–12 September 2008; IEEE: New York, NY, USA, 2008; Volume 1, pp. 39–44. [Google Scholar]
Jarrold, W.; Mundy, P.; Gwaltney, M.; Bailenson, J.; Hatt, N.; McIntyre, N.; Kim, K.; Solomon, M.; Novotny, S.; Swain, L. Social Attention in a Virtual Public Speaking Task in Higher Functioning Children with Autism. Autism Res. 2013, 6, 393–410. [Google Scholar] [CrossRef] [PubMed]
Loucks, L.; Yasinski, C.; Norrholm, S.D.; Maples-Keller, J.; Post, L.; Zwiebach, L.; Fiorillo, D.; Goodlin, M.; Jovanovic, T.; Rizzo, A.A.; et al. You can do that?!: Feasibility of Virtual Reality Exposure Therapy in the Treatment of PTSD due to Military Sexual Trauma. J. Anxiety Disord. 2019, 61, 55–63. [Google Scholar] [CrossRef] [PubMed]
Werner, P.; Rabinowitz, S.; Klinger, E.; Korczyn, A.D.; Josman, N. Use of the Virtual Action Planning Supermarket for the Diagnosis of Mild Cognitive Impairment. Dement. Geriatr. Cogn. Disord. 2009, 27, 301–309. [Google Scholar] [CrossRef] [PubMed]
Tarnanas, I.; Schlee, W.; Tsolaki, M.; Müri, R.; Mosimann, U.; Nef, T. Ecological Validity of Virtual Reality Daily Living Activities Screening for Early Dementia: Longitudinal Study. JMIR Serious Games 2013, 1, e2778. [Google Scholar] [CrossRef] [PubMed]
Schwarz, S.; Preda, M.; Baroncini, V.; Budagavi, M.; Cesar, P.; Chou, P.A.; Cohen, R.A.; Krivokuća, M.; Lasserre, S.; Li, Z.; et al. Emerging MPEG standards for point cloud compression. IEEE J. Emerg. Sel. Top. Circuits Syst. 2018, 9, 133–148. [Google Scholar] [CrossRef]
Han, B.; Liu, Y.; Qian, F. ViVo: Visibility-aware mobile volumetric video streaming. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, London, UK, 21–25 September 2020; pp. 1–13. [Google Scholar]
Ekeland, A.G.; Bowes, A.; Flottorp, S. Effectiveness of telemedicine: A systematic review of reviews. Int. J. Med. Inform. 2010, 79, 736–771. [Google Scholar] [CrossRef] [PubMed]
Adeghe, E.P.; Okolo, C.A.; Ojeyinka, O.T. A review of emerging trends in telemedicine: Healthcare delivery transformations. Int. J. Life Sci. Res. Arch. 2024, 6, 137–147. [Google Scholar] [CrossRef]
Maha, C.C.; Kolawole, T.O.; Abdul, S. Transforming mental health care: Telemedicine as a game-changer for low-income communities in the US and Africa. GSC Adv. Res. Rev. 2024, 19, 275–285. [Google Scholar] [CrossRef]
Andrade, R. Privacy-Preserving Face Detection: A Comprehensive Analysis of Face Anonymization Techniques. Master’s Thesis, Universidade do Porto, Porto, Portugal, 2024. [Google Scholar]
Riad, A.K.I.; Barek, M.A.; Rahman, M.M.; Akter, M.S.; Islam, T.; Rahman, M.A.; Mia, M.R.; Shahriar, H.; Wu, F.; Ahamed, S.I. Enhancing HIPAA Compliance in AI-driven mHealth Devices Security and Privacy. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; IEEE: New York, NY, USA, 2024; pp. 2430–2435. [Google Scholar]
Stauffert, J.P.; Niebling, F.; Latoschik, M.E. Latency and cybersickness: Impact, causes, and measures. A review. Front. Virtual Real. 2020, 1, 582204. [Google Scholar] [CrossRef]
Hu, K.; Yang, H.; Jin, Y.; Liu, J.; Chen, Y.; Zhang, M.; Wang, F. Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction; ACM: New York, NY, USA, 2023. [Google Scholar]
Rijmen, V.; Daemen, J. Advanced encryption standard. In Federal Information Processing Standards Publications; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2001; Volume 19, p. 22. [Google Scholar]
Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef]
Hu, K.; Jin, Y.; Yang, H.; Liu, J.; Wang, F. FSVVD: A dataset of full scene volumetric video. In Proceedings of the 14th Conference on ACM Multimedia Systems, Vancouver, BC, Canada, 7–10 June 2023; pp. 410–415. [Google Scholar]

Figure 1. SecureTeleMed workflow.

Figure 2. Viewport projection with region of interest (ROI) mapping.

Figure 3. Workflow of AES and RSA hybrid approach.

Figure 4. Default encryption/decryption time.

Figure 5. Optimized encryption/decryption time.

Figure 6. Privacy leakage rate with average.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, K.; Ma, D.; Qiu, S. SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine. Electronics 2025, 14, 3371. https://doi.org/10.3390/electronics14173371

AMA Style

Hu K, Ma D, Qiu S. SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine. Electronics. 2025; 14(17):3371. https://doi.org/10.3390/electronics14173371

Chicago/Turabian Style

Hu, Kaiyuan, Deen Ma, and Shi Qiu. 2025. "SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine" Electronics 14, no. 17: 3371. https://doi.org/10.3390/electronics14173371

APA Style

Hu, K., Ma, D., & Qiu, S. (2025). SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine. Electronics, 14(17), 3371. https://doi.org/10.3390/electronics14173371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SecureTeleMed: Privacy-Preserving Volumetric Video Streaming for Telemedicine

Abstract

1. Introduction

2. Related Work

2.1. Volumetric Video

2.2. Telemedicine

2.3. Privacy Concern in Volumetric Video

3. Method

3.1. Overview

3.2. Viewport Obfuscation

3.3. Frame-Wise Encryption

4. Experiment

4.1. Experiment Setup

4.2. Experimental Assessment

5. Discussion

5.1. Shortcomings and Limitations

5.1.1. Precision of User Interest Mapping

5.1.2. Latency Constraints

5.2. Future Work

5.2.1. Adaptive User Sensitivity Assessment

5.2.2. Real-World Testing in Diverse Applications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI