Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems

Kalan, Reza; Canatalay, Peren Jerfi; Karsli, Emre

doi:10.3390/computers14100404

Open AccessArticle

Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems

by

Reza Kalan

^1,*

,

Peren Jerfi Canatalay

¹

and

Emre Karsli

²

¹

Department of Computer Engineering, Istinye University, Istanbul 34396, Türkiye

²

R&D Department, Digiturk beIN Media Group, Istanbul 34353, Türkiye

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(10), 404; https://doi.org/10.3390/computers14100404

Submission received: 2 August 2025 / Revised: 9 September 2025 / Accepted: 9 September 2025 / Published: 23 September 2025

(This article belongs to the Special Issue Multimedia Data and Network Security)

Download

Browse Figures

Versions Notes

Abstract

Illegal broadcasting is one of the primary challenges for Over the Top (OTT) service providers. Watermarking is a method used to trace illegal redistribution of video content. However, watermarking introduces processing overhead due to the embedding of unique patterns into the video content, which results in additional latency. End-to-end network latency, caused by network congestion or heavy load on the origin server, can slow data transmission, impacting the time it takes for the segment to reach the client. This paper addresses 5xx errors (e.g., 503, 504) at the Content Delivery Network (CDN) in real-world video streaming platforms, which can negatively impact Quality of Experience (QoE), particularly when watermarking techniques are employed. To address the performance issues caused by the integration of watermarking technology, we enhanced the system architecture by introducing and optimizing a shield cache in front of the packager at the origin server and fine-tuning the CDN configuration. These optimizations significantly reduced the processing load on the packager, minimized latency, and improved overall content delivery. As a result, we achieved a 6% improvement in the Key Performance Indicator (KPI), reflecting enhanced system stability and video quality.

Keywords:

video streaming; watermarking; security; delay; CDN; QoE

1. Introduction

HTTP Adaptive Streaming (HAS) is widely used for video content delivery, allowing end users to connect to edge networks and stream video adaptively based on network throughput. With the popularity of video streaming services and the increasing demand for scalability, Content Delivery Networks (CDNs) provide an effective solution by placing video content near the network edge [1], which helps minimize access latency and reduce unnecessary network traffic. Applying security options, such as Digital Rights Management (DRM) and watermarking (WM), on video content to prevent illegal broadcast introduces additional challenges to the end-to-end delivery path. For example, when applying forensic watermarking techniques to video content, it is necessary to create two copies of the same content (“A and B”) at the origin level. This not only requires additional processing and packaging time, which can lead to increased delay, but also demands extra storage.

Quality of Experience (QoE) in video streaming refers to the overall satisfaction of end users with the streaming service, involving both technical performance and user perception. It is a critical metric that evaluates how well video playback aligns with user expectations, considering factors such as start-up delay, buffering, video resolution, and playback smoothness. To enhance QoE in video streaming, key mechanisms include (i) stability—minimizing bitrate fluctuations; (ii) fairness—ensuring equitable bandwidth distribution; and (iii) efficient resource utilization—optimizing network usage without compromising performance. In the absence of sufficient bandwidth, the client experience suffers due to poor video quality, buffering, and bitrate fluctuations.

When discussing HTTP or web-based applications, it is important to consider not only the initial interaction, but also the user’s subsequent interactions, which significantly impact overall QoE [2]. For example, in video streaming, it is expected that each video segment is received without delay to ensure smooth playback. The Time-to-First-Byte (TTFB) error [3,4] occurs when there is a delay in receiving the initial byte after a client requests a video segment. Technically, this is not exactly an error, but in case of slow response (if the server takes too long to send the first byte), it results in a performance issue. In the CDN configuration, if the origin server does not respond in time, “first byte timeout” might trigger an error labeled 5xx (e.g., 503, 503, and 524). However, high traffic and complex data processing, such as watermarking, often hamper performance. Therefore, an efficient caching mechanism is optimized for origin server acceleration, reducing the load on the origin server and significantly accelerating video delivery.

End-to-end latency is influenced by the entire content delivery chain, including content capture, encoding, packaging, encryption, and CDN delivery [5]. Implementing content protection methods adds processing overhead on both the origin media server and the delivery network, which can increase response times. This study aims to address the TTFB issue, often leading to 5xx error codes. These errors, typically caused by slow response times, negatively impact QoE. We reduced the occurrence of 5xx errors on a watermark-integrated live video streaming platform and achieved improved video performance by enhancing the existing system architecture.

The remainder of this study is organized as follows. Section 2 discusses background and related works. Section 3 introduces the system architecture, followed by the proposed method and enhancement in Section 4. The experimental results are presented in Section 5. Finally, the paper is concluded in Section 6.

2. Background and Related Works

2.1. Adaptive Streaming Overview

Adaptive video streaming is a technology used in online video distribution that adjusts video quality in real time based on the end-user’s internet connection speed and device capabilities. This optimizes playback performance, ensuring minimal buffering and a smooth viewing experience. HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH) are the two main adaptive streaming protocols widely adopted by Over-the-Top (OTT) service providers [6]. Both technologies aim to minimize buffering and latency, ensuring smooth playback and an optimal user experience across various network conditions and platforms.

Although both technologies serve similar purposes [7] and provide metadata—such as segment durations, available bitrates, codecs, and encoding parameters—they differ in terms of architecture and flexibility. HLS uses playlist files (m3u8), while DASH utilizes a Media Presentation Description (MPD) manifest. One of the key advantages of DASH is its codec-agnostic design, which supports a wide range of modern codecs, such as Advanced Video Coding (AVC/H.264), High Efficiency Video Coding (HEVC/H.265), Video Processor 9 (VP9), and AOMedia Video 1 (AV1). In contrast, HLS has traditionally relied on the MPEG-2 TS format and has stronger native support on Apple devices, although newer versions also support fragmented MP4 (fMP4). This makes DASH more attractive in terms of cross-platform scalability, especially in environments with diverse device ecosystems. It is worth recalling that H.264 encoding is particularly suitable and broadly compatible with legacy devices, due to its lower processing requirements.

From a security perspective, DASH supports common encryption, which allows multiple DRM systems such as Google’s Widevine, Microsoft’s PlayReady, and Apple’s FairPlay to work with a single encrypted stream. In contrast, HLS is primarily tied to Apple’s FairPlay DRM, making it less flexible for multi-platform DRM deployments. This broader DRM compatibility makes DASH a more versatile choice for content providers aiming to reach diverse devices and platforms while maintaining robust content protection.

Figure 1 illustrates a conceptual overview of end-to-end Adaptive Bitrate (ABR) video streaming technology and its applications. On the server side, a video is encoded in multiple bitrates (or representations), fed to the origin media server, and distributed to end users via the CDN. Each video is split into smaller segments, typically ranging from 2 to 6 s. The manifest file contains metadata to help the client adapt to the appropriate video representation. On the client side, video playback is initiated by downloading the manifest file. Due to the dynamic nature of edge networks, the client constantly adjusts the video bitrate to minimize buffering. When a client requests a video segment, the CDN first checks if it is available at the edge. If the segment is cached from a prior request, the CDN will serve it quickly. If not, the request is passing to the origin server. Once the origin server prepares the requested segment in the correct format (HLS or DASH), it returns the segment to the CDN, which then delivers it to the client and caches it at the edge for future use.

2.2. Secure Streaming with Watermarking

Watermarking is a technique used to embed imperceptible information within digital content [9,10] to support security functions such as authentication, content protection [11], and traceability. Watermarking technology allows unique content requests for each client, tracking distribution from anonymous sources [12,13]. Although watermarking protects intellectual property and proves ownership in piracy cases, DRM ensures authorized access. Watermarking is classified into server-side and client-side techniques. In both watermarking techniques, the main considerations are transparency and ensuring unique patterns for each client. Watermark solutions alternate between watermarked copies (“A” and “B”) to maintain unique sequences. In server-side watermarking, customizing playlists at the origin or CDN level demands considerable resources, whereas client-side solutions introduce potential security vulnerabilities. Furthermore, client-side watermarking needs for regular update for each device.

Figure 2 illustrates our server-side watermarking solution implemented in a live video streaming platform. In the video streaming chain, the client is authorized by the Service Delivery Platform (SDP), which connects to the OTT service and provides a short “token” for CDN access. The CDN verifies the token’s validity and issues a long token, allowing the client to start streaming. Each client is assigned a unique streaming pattern, consisting of a sequence of “A” and “B” copies. The architecture shows that the origin media server must create two separate copies (“A” and “B”) for each video segment and representation, considering different video formats. This increases processing time and, consequently, slows down response times.

2.3. Related Works

In particular, multimedia watermarking has been the subject of extensive academic and industry research. Video streams can be analyzed across multiple dimensions, including visual content, audio tracks, and metadata, each serving as a distinct channel for embedding watermarking techniques, as examined in [14]. However, this research focuses specifically on watermarks embedded within the visual component of the stream, which is treated as the primary carrier signal. In [15] a practical design for user-specific watermarking is proposed to track unauthorized distributors of digital video content. The fundamental design objective is to minimize the computational complexity involved in the embedding process, thus enhancing the feasibility of real-world applications. The authors in [16] review watermarking techniques tailored for HEVC/H.265, with a focus on their application in authentication and copyright protection. The paper identifies key challenges, such as compression artifacts, and proposes potential research directions to improve the robustness of watermarking methods in HEVC/H.265 video streams. The authors in [17] propose a method called IToV, which extends deep learning-based watermarking techniques to video content. The approach uses temporal information and depthwise convolutions to enhance the efficiency and robustness of watermark embedding in video streams. Additionally, forensic A/B watermarking was implemented for ABR streaming in [18], enabling robust session-level tracking and piracy detection without necessitating extensive server-side modifications.

Overall, while watermarking algorithms based on transformations such as Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Dual-Tree Complex Wavelet Transform (DTCWT), and Singular Value Decomposition (SVD) have demonstrated promising performance in terms of robustness and imperceptibility, existing research has largely focused on watermark embedding within the video signal itself. However, these studies often overlook critical practical considerations such as the scalability of watermarking systems, the complexity of integration into existing video delivery pipelines, and the processing overhead introduced by watermarking operations. These factors become especially significant during peak traffic periods, where added latency can lead to noticeable buffering and degraded user experience in ABR streaming scenarios.

Table 1 summarizes recent methods employed for robust watermarking. Although these studies offer valuable insights into the application of watermarking techniques in video streaming, none have explored their real-world deployment within a distribution platform, particularly in the context of live events, which are characterized by highly dynamic traffic and a diverse range of video devices. In this study, we do not introduce a new watermarking technique but instead focus on addressing the critical challenge of integrating watermarking into a live video streaming platform, including the TTFB issue, which negatively affects experienced video quality QoE. This practical integration issue is often overlooked in the academic literature, which tends to focus on the theoretical or algorithmic aspects of watermarking rather than its impact on streaming performance in live OTT platforms. This specific gap is the focus of our research.

3. System Architecture

In this work, our aim is to address the issue of increased TTFB, which is primarily caused by the latency introduced through the heavy processing load associated with integrating watermarking into a live video streaming platform. To this end, we first delve into the technical architecture and operational aspects of an end-to-end video delivery system.

3.1. Workflow on Origin Media Servers

Video streaming platforms deliver live and Video-on-Demand (VoD) services, with certain processing tasks such as transcoding, encryption, and packaging that need to be performed prior to content distribution.

Ingest: The ingest phase involves transferring high-quality source content (often referred to as the mezzanine or master video) into the streaming platform’s processing pipeline. These master files are typically encoded at high bitrate and resolution, using minimal compression, to preserve maximum visual fidelity for downstream processing tasks.
Transcoding: Refers to transforming a video from one format (e.g., HLS, DASH), resolution, or bitrate (e.g., 480p, 720p, 1080p) to another to support playback on different clients (devices).
Just-in-time packaging: On-the-fly packaging, also known as Just-in-Time Packaging (JITP), is a real-time process that dynamically converts pre-transcoded video files into video formats such as HLS, DASH, and Common Media Application Format (CMAF), in response to client requests. When a client initiates playback, the packager dynamically (i) generates the appropriate manifest file (e.g., .m3u8 for HLS or .mpd for DASH), detailing available resolutions, bitrates, and segment locations and (ii) wraps the video segments to the correct container (e.g., .ts for HLS or .fmp4 for DASH).
Manifest update and delivery: In live streaming, the manifest—which contains metadata about available video segments, bitrates, codecs, and timing information—is continuously updated to include newly generated segments, allowing the client’s player to seamlessly fetch the latest content.

To improve on-the-fly packaging performance, we introduced a shield cache in front of the packager to alleviate load during traffic peaks when request volumes surge toward origin servers. This mechanism reduces latency at the origin server, thus accelerating response times and improving system efficiency.

3.2. Content Delivery and Caching at Edge

CDNs are designed to distribute content as close to end-users as possible. By connecting users to the nearest edge server, CDNs significantly reduce latency and improve performance, allowing faster and more efficient data access. In addition to accelerating content delivery, CDNs enhance service availability during traffic spikes, strengthen security against attacks, and reduce core network traffic. Pre-fetching and cache warm-up are also important CDN advantages, as they proactively load content closer to users before it is requested, further reducing latency and improving user experience.

Figure 3 illustrates general architecture of a video delivery system, where caching content across different CDN tiers enhances delivery efficiency [32]. From a technical perspective, when a CDN-edge server receives a request, it first checks its cache. In the case of a cache miss, such as a request for segment “S3” from representation “R3” that is not available in either Tier 1 or Tier 2 caches, the CDN-edge forwards the request to the origin media servers, which are typically located at a remote distance. From that point forward, all incoming requests from other clients for the same video segment are queued until the CDN-edge receives a response from the origin server. As the CDN-edge begins to receive the requested video segment from the origin server, it immediately serves the waiting clients while simultaneously caching the content for future requests. Given the higher cost associated with Tier 1 storage compared to Tier 2, CDNs employ a hybrid approach (utilizing both tiers) to strike an optimal balance between cost, performance, and system reliability.

4. Proposed Enhancing and Tuning Configuration

Equation (1) represent the cache miss ratio. During peak traffic periods, we observed that over 98% of incoming requests were served directly from the CDN-edge. However, the remaining cache-miss requests, which were forwarded to the origin media server, caused a significant increase in server load. This led to slow response times, primarily due to the forensic “A/B” watermarking process, which forces the packager to generate two distinct versions of each video segment. It is important to note that each video segment is encoded in multiple bitrates and delivered in various formats, which further amplifies the computational overhead during the packaging process. Introducing a shield cache in front of the packager helps reduce the impact of CDN cache misses and eliminates the need for on-demand packaging, which is a time-consuming task.

C a c h e m i s s r a t i o = \frac{m i s s}{h i t + m i s s}

(1)

Another enhancement involves improved bitrate management, where mobile clients are discouraged from requesting the highest representation, which often leads to excessive buffering, particularly during peak times or in regions with inadequate network resources. This strategy aligns with mobile user preferences, prioritizing smooth playback over maximum resolution.

To further optimize the system, load balancing and the separation of mobile and big-screen clients across different origin servers and shield caches have been introduced. This architecture helps reduce unnecessary processing on the packager, as mobile clients typically request content at one level below the maximum bitrate. Combined with the enhanced bitrate management, these changes significantly reduce buffering risks and improve overall user experience, especially in bandwidth-constrained environments.

Proper configuration of TTFB and related runtime parameters is crucial for maintaining optimal performance and user experience. Depending on server resources and traffic patterns, misconfigured settings can lead to undesirable behaviors such as increased latency, cache inefficiencies, and degraded responsiveness. Achieving the right balance in timeout configuration is especially important. If the timeout is set too high, it can mask underlying issues and cause unnecessary delays for end users, leading to a sluggish experience. Conversely, setting it too low can result in frequent timeouts and cache misses, even when the origin server is functioning correctly under load. This not only puts pressure on the server but may also reduce content availability and consistency. Careful tuning based on real-world traffic and performance data ensures a more resilient, responsive system that delivers content efficiently without compromising reliability.

To reduce first byte time out, we enhanced some configurations in both the origin server and CDN levels:

Enhancing origin shield performance: Optimizing the shield cache size in front of the origin packager and efficiently distributing the load across origin servers helps reduce packager response times. Furthermore, utilizing faster SSD can significantly reduce latency and deliver higher input/output performance, which is critical for efficient and high-throughput video transcoding.
Edge pre-loading: This involves proactively loading upcoming video segments into the CDN-edge before they are requested by clients. The next object request in Common Media Client Data (CMCD) specifies a relative Uniform Resource Locator (URL) path to the next object the client intends to request. This path can be used to trigger pre-fetching by the CDN, allowing it to cache content in advance and improve streaming performance. While the inclusion of this request prompts the CDN to consider pre-fetching, the client should not rely on the pre-fetch action being executed. It merely serves as a suggestion for the CDN to optimize content delivery and reduce latency, ensuring a smoother user experience.
Tune origin timeout settings: Tuning origin timeout settings is essential for ensuring a responsive and reliable caching system. If the origin server responds slowly (such as when delays are introduced by resource-intensive processes like watermarking), it may be necessary to increase the timeout threshold to prevent unnecessary request failures. There is a sample configuration below.

backend default {

.host = "origin.example.com";

.first_byte_timeout = 3s;

.connect_timeout = 1s;

.between_bytes_timeout = 10s;

}

The default first byte timeout for the web server is approximately 10 s. However, multimedia streaming, especially live video delivery, requires significantly faster response times. In modern environments, CDNs and backend servers typically establish connections and deliver the “First Byte” (FB) in under 500 ms.

To optimize bandwidth utilization, we applied a fair yet differentiated bitrate allocation across various client classes. In this configuration, the ABR algorithm on the client side can adapt to the highest available video representation based on network conditions. However, to ensure smooth and uninterrupted playback on mobile devices, the ABR algorithm for mobile clients is configured with more conservative adaptation limits, restricting the ability to select the highest video quality. As a result, bandwidth and cache utilization remain efficient without compromising video quality.

5. Experimental Results

5.1. Material and Method

This study relies on a real-life video streaming platform where thousands of clients are concurrently connected through heterogeneous networks and begin streaming a live event secured with forensic server-side watermarking. Two different copies (“A” and “B”) of each video segment are available. The video file is encoded at different bitrates (representations) according to the configuration in Table 2. In this setup, higher resolution corresponds to better video quality. Therefore, clients with high-speed network connections can stream video in higher quality, experiencing higher bitrate playback, less buffering, and fewer fluctuations. It is crucial to note that this process entails multiple iterations, and therefore, the results presented are the mean values derived from a series of events, accounting for variability and ensuring statistical reliability.

5.2. Experimental Results

Figure 4 illustrates approximately 440,000 users concurrently streaming a live sports event (football), which initially resulted in a surge of 5xx errors at the CDN level. These errors were observed before addressing the TTFB issue, which stemmed from high end-to-end latency. This latency was technically introduced after implementing security measures on the video content, particularly through the integration of watermarking. However, as shown, after optimizing the video delivery pipeline as previously described, the frequency of these errors was significantly reduced. This optimization ensured that clients no longer experienced excessive delays at the CDN level and could receive the first byte of video content promptly. Without this improvement, clients would often disconnect due to prolonged delays in downloading the initial data. As a result, both video quality and overall user experience were noticeably improved.

This outcome indicates that, at the beginning of the live event, when a spike in client activity occurred as many attempted to play the video simultaneously, the error rate increased dramatically. At this stage, cache performance can significantly affect client access times. While caches are typically effective at serving frequently requested content, the probability of video segments being available in cache at the start of the event is low. As a result, the CDN forwards many incoming requests to the remote origin server and waits for a response. However, the high volume of traffic during this initial spike can lead to slower response times from the origin and an increased likelihood of 5xx errors.

Experimental results from our production platform indicate that the integration of watermarking leads to a slight degradation in overall KPIs, primarily due to declines in quality metrics such as increased playback failures and buffering events. From a server load perspective, watermarking introduces additional processing overhead, as the server must generate and package two distinct copies (“A” and “B”) of each video segment. This additional workload contributes to increased response times. Although the shield cache positioned in front of the origin server typically has limited storage capacity, efficient cache management can significantly mitigate server-side latency. By reducing the reliance on just-in-time packaging, an inherently time-consuming operation, the system improves content delivery performance, particularly under high-concurrency conditions.

As shown in Table 3, the reduction in 5xx errors correlates with enhanced streaming quality, as reflected in shorter buffering durations and fewer playback failures. Table 4 compares the QoE parameters between the conventional approach and the proposed enhancement, both of which apply watermarking within the video streaming platform. Higher received bitrate, lower start-up time, minimal bitrate fluctuation, and efficient utilization of network bandwidth are key factors contributing to improved video quality on the end-user side.

Equation (2) illustrates the concept behind calculating the fluctuation of the bitrate. A lower fluctuation value indicates that the client experienced smoother playback, which is preferred. The Connection Induced Rebuffering Ratio (CIRR) is used to assess how much rebuffering (video playback interruptions) is caused specifically by network connection issues, such as insufficient bandwidth and high latency, rather than by other factors like the client’s buffer settings or the behavior of the video player. As seen, there is an improvement in all factors, indicating a positive trend across all measured parameters. This suggests that the optimizations implemented have led to noticeable enhancements in performance and quality.

B i t r a t e F l u c t u a t i o n = \frac{\sum B i t r a t e_S w i t c h e s_{e n d d e d_p l a y e s}}{E n d e d_P l a y s_{s e s s i o n}}

(2)

Analysis of mobile client behavior indicates that applying bitrate management leads to improved QoE. While it is true that the ABR algorithm tends to be aggressive—often attempting to download video segments from the highest available representation—in a competitive environment with limited resources, this can result in increased buffering, particularly for legacy devices with lower processing capabilities. These devices may struggle to handle high-resolution or watermarked video segments efficiently. By applying a maximum bitrate restriction, we avoid these issues, ultimately achieving similar or even better QoE for mobile users. Figure 5 clearly demonstrates the effectiveness of the applied bitrate restriction.

We also found that different client devices exhibit varying performance levels. Figure 6 compares the performance of iOS and Android devices with and without the impact of the first byte delay. Typically, iOS devices demonstrate better performance, and this becomes even more evident after mitigating the first byte issue. Additionally, we observed that fixed clients, such as smart TVs, are more susceptible to delays than mobile devices. This is likely due to limitations in the underlying network infrastructure in certain regions during peak times, which hampers the ability to support high-resolution streaming.

5.3. Discussion and Findings

End-to-end network latency, caused by network congestion or heavy load on the origin server, can slow data transmission, affecting the time it takes for the segment to reach the client [34]. This leads to a decreased QoE, due to issues such as lower bitrate or rebuffering. During peak traffic, when the origin server is overwhelmed with additional requests, the watermarking process can strain the server and negatively impact the end-users’ QoE.

When a client makes a request to an origin server, it can reuse the same connection for multiple requests. The keep-alive timeout defines how long that connection remains open while waiting for additional requests. If no activity occurs during that period, the server closes the connection. This provides several advantages, such as reducing the number of Transmission Control Protocol (TCP) handshakes, reducing overhead, and preventing unused connections from consuming resources indefinitely. This timeout is a configurable variable at different points along the end-to-end delivery path, as illustrated in (3).

K e e p A l i v e T i m e o u t : T i e r 1 < T i e r 2 < O r i g i n s e r v e r

(3)

The TTFB refers to the duration between the opening of a connection and the moment the client receives the first byte of the response to its request. According to [35], a TTFB of 0.8 s or less is considered good, indicating a responsive server, while values exceeding 1.8 s are classified as poor and may reflect latency issues or server-side delays. At the CDN level, “first byte” response time refers to the delay the CDN experiences in receiving the first byte from the origin server. This value is typically less than the “keep-alive” timeout. However, problems arise when the first byte response time exceeds the threshold, but the edge server still has not received the first byte. In such cases, the connection times out, and a 5xx error is returned to all clients at the edge who were waiting for the segment. As the CDN-edge serves more clients, this dramatically increases the error rate at the edge, resulting in a lower QoE for users. Fine-tuning the response time at the CDN-edge leads to better performance by reducing the number of errors. This, in turn, reduces the load on the origin server due to a decreased packaging time, which involves substantial processing effort. Furthermore, replacing origin servers’ hard disk with high speed writing capacity reduces response time.

6. Conclusions

Providing secure video streaming without compromising the user experience is one of the primary challenges faced by OTT providers. Implementing content protection methods such as watermarking and DRM adds extra processing overhead on both the origin media server and the packager, which can lead to increased response times. We observed that clients can download 6 s of a video segment in less than 2 s, in the worst case, which is acceptable. However, they occasionally experience interruptions in video playback. We found that after watermarking the video content, the error rate increased on the client side, which negatively impacted the KPI values. Through deep monitoring and data log analytics, we identified that the issue was caused by first byte timeouts at the CDN level.

Optimizing the introduced shield cache, in conjunction with load-balancing traffic across origin servers based on client types (e.g., mobile devices vs. large-screen displays), and implementing effective cache purging strategies, contributes to improved server response times. Furthermore, optimizing the TTFB enhances client-side performance and improves QoE by reducing the error rate at the CDN level.

As a future work, we aim to evolve towards a CDN-agnostic architecture that allows applied watermarks to work across multiple CDNs without compromising QoE. This paradigm will provide flexibility and scalability in content delivery by decoupling watermarking from specific CDNs. By adopting this approach, we can optimize the end-user experience while ensuring that watermarks remain intact and effective regardless of the CDN used. This migration is crucial to adapt to the ever-changing content delivery landscape, improve global scalability, and provide a more seamless, unified experience for users worldwide.

Author Contributions

Conceptualization, R.K.; methodology, R.K. and E.K.; investigation, R.K., P.J.C., and E.K.; implementation, R.K. and E.K.; writing—original draft preparation, R.K.; writing—review and editing, R.K., P.J.C. and E.K.; Visualization, R.K. and E.K. All authors have read and agreed to the published version of the manuscript.

Funding

The APC amount was provided from internal funds not pertaining any specific project.

Data Availability Statement

The datasets presented in this article are not readily available because of third-party restrictions.

Acknowledgments

This research has been conducted with the support of the Digiturk beIN Media Group R&D team in close collaboration with Istinye University.

Conflicts of Interest

Author Emre Karsli is employed by R&D Department, Digiturk beIN Media Group, Istanbul 34353, Türkiye. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Farahani, R.; Azimi, Z.; Timmerer, C.; Prodan, R. Towards ai-assisted sustainable adaptive video streaming systems: Tutorial and survey. arXiv 2024, arXiv:2406.02302. [Google Scholar] [CrossRef]
Jahromi, H.Z.; Delaney, D.T.; Hines, A. Beyond first impressions: Estimating quality of experience for interactive web applications. IEEE Access 2020, 8, 47741–47755. [Google Scholar] [CrossRef]
Gao, Q.; Dey, P.; Ahammad, P. Perceived performance of top retail webpages in the wild: Insights from large-scale crowdsourcing of above-the-fold qoe. In Proceedings of the Workshop on QoE-Based Analysis and Management of Data Communication Networks, Los Angeles, CA, USA, 21 August 2017; pp. 13–18. [Google Scholar]
Thelagathoti, R.K.; Mastorakis, S.; Shah, A.; Bedi, H.; Shannigrahi, S. Named data networking for content delivery network workflows. In Proceedings of the 2020 IEEE 9th International Conference on Cloud Networking (CloudNet), Piscataway, NJ, USA, 9–11 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
Lyko, T.; Broadbent, M.; Race, N.; Nilsson, M.; Farrow, P.; Appleby, S. Improving quality of experience in adaptive low latency live streaming. Multimed. Tools Appl. 2024, 83, 15957–15983. [Google Scholar] [CrossRef]
Khan, M.A.; Baccour, E.; Chkirbene, Z.; Erbad, A.; Hamila, R.; Hamdi, M.; Gabbouj, M. A survey on mobile edge computing for video streaming: Opportunities and challenges. IEEE Access 2022, 10, 120514–120550. [Google Scholar] [CrossRef]
Bentaleb, A.; Zhan, Z.; Tashtarian, F.; Lim, M.; Harous, S.; Timmerer, C.; Hellwagner, H.; Zimmermann, R. Low latency live streaming implementation in dash and hls. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 7343–7346. [Google Scholar]
Kalan, R.; Dulger, I. A Survey on QoE Management Schemes for HTTP Adaptive Video Streaming: Challenges, Solutions, and Opportunities. IEEE Access 2024, 12, 170803–170839. [Google Scholar] [CrossRef]
Villegas-Ch, W.; García-Ortiz, J.; Govea, J. A comprehensive approach to image protection in digital environments. Computers 2023, 12, 155. [Google Scholar] [CrossRef]
Helmy, M.; Torkey, H. Secured Audio Framework Based on Chaotic-Steganography Algorithm for Internet of Things Systems. Computers 2025, 14, 207. [Google Scholar] [CrossRef]
Aberna, P.; Agilandeeswari, L. Digital image and video watermarking: Methodologies, attacks, applications, and future directions. Multimed. Tools Appl. 2024, 83, 5531–5591. [Google Scholar] [CrossRef]
Asikuzzaman, M.; Pickering, M.R. An overview of digital video watermarking. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2131–2153. [Google Scholar] [CrossRef]
Dhevanandhini, G.; Yamuna, G. An effective and secure video watermarking using hybrid technique. Multimed. Syst. 2021, 27, 953–967. [Google Scholar] [CrossRef]
Artru, R.; Gouaillard, A.; Ebrahimi, T. Digital Watermarking of video streams: Review of the State-of-the-Art. arXiv 2019, arXiv:1908.02039. [Google Scholar] [CrossRef]
Su, P.C.; Kuo, T.Y.; Li, M.H. A practical design of digital watermarking for video streaming services. J. Vis. Commun. Image Represent. 2017, 42, 161–172. [Google Scholar] [CrossRef]
Elrowayati, A.A.; Alrshah, M.A.; Abdullah, M.F.L.; Latip, R. HEVC watermarking techniques for authentication and copyright applications: Challenges and opportunities. IEEE Access 2020, 8, 114172–114189. [Google Scholar] [CrossRef]
Ye, G.; Gao, J.; Wang, Y.; Song, L.; Wei, X. ItoV: Efficiently adapting deep learning-based image watermarking to video watermarking. In Proceedings of the 2023 International Conference on Culture-Oriented Science and Technology (CoST), Xi’an, China, 11–14 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 192–197. [Google Scholar]
Mareen, H.; Van Wallendael, G.; Lambert, P. Implementation-free forensic watermarking for adaptive streaming with A/B watermarking. In Proceedings of the Sixth International Congress on Information and Communication Technology: ICICT 2021, London, UK, 25–26 February 2021; Springer: Berlin/Heidelberg, Germany, 2021; Volume 1, pp. 325–339. [Google Scholar]
Ayubi, P.; Jafari Barani, M.; Yousefi Valandar, M.; Yosefnezhad Irani, B.; Sedagheh Maskan Sadigh, R. A new chaotic complex map for robust video watermarking. Artif. Intell. Rev. 2021, 54, 1237–1280. [Google Scholar] [CrossRef]
Takale, S.; Mulani, A. DWT-PCA based video watermarking. J. Electron. Comput. Netw. Appl. Math. (JECNAM) ISSN 2022, 2, 6. [Google Scholar] [CrossRef]
He, M.; Wang, H.; Zhang, F.; Abdullahi, S.M.; Yang, L. Robust blind video watermarking against geometric deformations and online video sharing platform processing. IEEE Trans. Dependable Secur. Comput. 2022, 20, 4702–4718. [Google Scholar] [CrossRef]
Asikuzzaman, M.; Mareen, H.; Moustafa, N.; Choo, K.K.R.; Pickering, M.R. Blind camcording-resistant video watermarking in the DTCWT and SVD domain. IEEE Access 2022, 10, 15681–15698. [Google Scholar] [CrossRef]
Kaczyński, M.; Piotrowski, Z. High-quality video watermarking based on deep neural networks and adjustable subsquares properties algorithm. Sensors 2022, 22, 5376. [Google Scholar] [CrossRef]
Mali, S.D.; Agilandeeswari, L. Non-redundant shift-invariant complex wavelet transform and fractional gorilla troops optimization-based deep convolutional neural network for video watermarking. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101688. [Google Scholar] [CrossRef]
Chen, S.; Malik, A.; Zhang, X.; Feng, G.; Wu, H. A fast method for robust video watermarking based on Zernike moments. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7342–7353. [Google Scholar] [CrossRef]
Hazim, H.T.; Alseelawi, N.; ALRikabi, H.T. A Novel Method of Invisible Video Watermarking Based on Index Mapping and Hybrid DWT-DCT. Int. J. Online Biomed. Eng. 2023, 19, 155–173. [Google Scholar] [CrossRef]
Liu, Q.; Yang, S.; Liu, J.; Zhao, L.; Xiong, P.; Shen, J. An efficient video watermark method using blockchain. Knowl.-Based Syst. 2023, 259, 110066. [Google Scholar] [CrossRef]
Fernandez, P.; Elsahar, H.; Yalniz, I.Z.; Mourachko, A. Video seal: Open and efficient video watermarking. arXiv 2024, arXiv:2412.09492. [Google Scholar] [CrossRef]
Lin, L.; Wu, D.; Wang, J.; Chen, Y.; Zhang, X.; Wu, H. Automatic, robust and blind video watermarking resisting camera recording. IEEE Trans. Circuits Syst. Video Technol. 2024, 40, 13413–13426. [Google Scholar] [CrossRef]
Aissaoui, N.E.; Azzaz, M.S.; Kaibou, R.; Tanougast, C. Efficient FPGA implementation of chaos-based real-time video watermarking system in spatial and DWT domain using QIM technique. J. Real-Time Image Process. 2025, 22, 40. [Google Scholar] [CrossRef]
He, M.; Wang, H.; Zhang, F.; Wang, H. Design Principles for Orthogonal Moments in Video Watermarking. IEEE Trans. Dependable Secur. Comput. 2025, 22, 5603–5616. [Google Scholar] [CrossRef]
Zhang, A.; Li, Q.; Chen, Y.; Ma, X.; Zou, L.; Jiang, Y.; Xu, Z.; Muntean, G.M. Video super-resolution and caching—An edge-assisted adaptive video streaming solution. IEEE Trans. Broadcast. 2021, 67, 799–812. [Google Scholar] [CrossRef]
Kalan, R.S. Improving quality of HTTP adaptive streaming with server and network-assisted DASH. In Proceedings of the 2021 17th International Conference on Network and Service Management (CNSM), Izmir, Turkey, 25–29 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 244–248. [Google Scholar]
Shi, W.; Li, Q.; Yu, Q.; Wang, F.; Shen, G.; Jiang, Y.; Xu, Y.; Ma, L.; Muntean, G.M. A Survey on Intelligent Solutions for Increased Video Delivery Quality in Cloud-Edge-End Networks. IEEE Commun. Surv. Tutor. 2025, 27, 1363–1394. [Google Scholar] [CrossRef]
Barry, P.; Wagner, J. Time to First Byte (TTFB). In Speed Metrics Guide; Springer: Berlin/Heidelberg, Germany, 2025. [Google Scholar]

Figure 1. A conceptual overview of end-to-end adaptive bitrate video streaming architecture and operational workflow [8]. Captured video is encoded into multiple bitrate (represented in different colors in the figure) and distributed globally via multiple CDNs. Clients connect to the nearest CDN and dynamically download video segments that best match their current network throughput, ensuring smooth playback and optimal quality.

Figure 2. Proposed and enhanced ABR streaming integrated with server-side watermarking. Before initiating playback, each client must be authenticated by both the OTT and the CDN. Once authorized, the client connects to the nearest CDN and dynamically downloads video segments.

Figure 3. Video delivery system and caching mechanism at different tiers, adapted from [33], with modifications. Caching video segments at the CDN level reduces the load on the origin server and significantly improves response time.

Figure 4. Impact of the applied enhancements on 5xx errors during a high-traffic live sports event.

Figure 5. Impact of the applied enhancements on quality metric in mobile clients.

Figure 6. Impact of the applied enhancements on quality metric in different clients.

Table 1. Summary of recent research on video watermarking approaches.

Ref.	Year	Description of Applied Watermarking Method	Lab	Live
[19]	2021	Video watermarking based on chaotic complex map	✓	✗
[20]	2022	Video watermarking based on discrete wavelet alter	✓	✗
[21]	2022	Video watermarking based on low-order recursive Zernike	✓	✗
[22]	2022	WM based on value decomposition and wavelet transform	✓	✗
[23]	2022	Leveraging deep neural networks for watermarking	✓	✗
[24]	2023	Using optimized deep learning for watermarking	✓	✗
[25]	2023	Leveraging Zernike moments for video watermarking	✓	✗
[26]	2023	Video watermarking based on index mapping	✓	✗
[27]	2023	An efficient video watermarking method using blockchain	✓	✗
[28]	2024	Introduces an approach for video WM based on blockchain	✓	✗
[29]	2024	A robust video watermarking resisting camera recording	✓	✗
[30]	2025	Chaos-based video watermarking in WDT domain	✓	✗
[31]	2025	Texture-aware video WM based on orthogonal moments	✓	✗
Our study: Address TTFB problem and QoE degradation in watermarked
live video streaming platform, especially during peak traffic periods			✗	✓

Table 2. Video configuration and setting of different bitrates.

Symbol	#R1	#R2	#R3	#R4
Resolution	640 × 360	960 × 540	1280 × 720	1920 × 1080

Table 3. Effects of content watermarking in KPI values.

Streaming	Playback Failure	Buffering	KPI
Conventional	14%	0.9%	>73%
Proposed	10%	0.7%	>78%

Table 4. Comparing QoE parameters in conventional and proposed approach.

Streaming	Rec. Bitrate (Mbps)	Start-Up Delay (Second)	Number of Fluctuations	CIRR (%)
Conventional	2.8	5.1	5.3	0.19
Proposed	3.1	3.9	4.4	0.13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kalan, R.; Canatalay, P.J.; Karsli, E. Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems. Computers 2025, 14, 404. https://doi.org/10.3390/computers14100404

AMA Style

Kalan R, Canatalay PJ, Karsli E. Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems. Computers. 2025; 14(10):404. https://doi.org/10.3390/computers14100404

Chicago/Turabian Style

Kalan, Reza, Peren Jerfi Canatalay, and Emre Karsli. 2025. "Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems" Computers 14, no. 10: 404. https://doi.org/10.3390/computers14100404

APA Style

Kalan, R., Canatalay, P. J., & Karsli, E. (2025). Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems. Computers, 14(10), 404. https://doi.org/10.3390/computers14100404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Security-Aware Adaptive Video Streaming via Watermarking: Tackling Time-to-First-Byte Delays and QoE Issues in Live Video Delivery Systems

Abstract

1. Introduction

2. Background and Related Works

2.1. Adaptive Streaming Overview

2.2. Secure Streaming with Watermarking

2.3. Related Works

3. System Architecture

3.1. Workflow on Origin Media Servers

3.2. Content Delivery and Caching at Edge

4. Proposed Enhancing and Tuning Configuration

5. Experimental Results

5.1. Material and Method

5.2. Experimental Results

5.3. Discussion and Findings

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI