Lower-Latency Screen Updates over QUIC with Forward Error Correction

Eghbal, Nooshin; Lu, Paul

doi:10.3390/fi17070297

Open AccessArticle

Lower-Latency Screen Updates over QUIC with Forward Error Correction

by

Nooshin Eghbal

and

Paul Lu

^*

Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(7), 297; https://doi.org/10.3390/fi17070297

Submission received: 25 May 2025 / Revised: 19 June 2025 / Accepted: 19 June 2025 / Published: 30 June 2025

Download

Browse Figures

Versions Notes

Abstract

There are workloads that do not need the total data ordering enforced by the Transmission Control Protocol (TCP). For example, Virtual Network Computing (VNC) has a sequence of pixel-based updates in which the order of rectangles can be relaxed. However, VNC runs over the TCP and can have higher latency due to unnecessary blocking to ensure total ordering. By using Quick UDP Internet Connections (QUIC) as the underlying protocol, we are able to implement a partial order delivery approach, which can be combined with Forward Error Correction (FEC) to reduce data latency. Our earlier work on consistency fences provides a mechanism and semantic foundation for partial ordering. Our new evaluation on the Emulab testbed, with two different synthetic workloads for streaming and non-streaming updates, shows that our partial order and FEC strategy can reduce the blocking time and inter-delivery time of rectangles compared to total delivery. For one workload, partially ordered data with FEC can reduce the 99-percentile message-blocking time to 0.4 ms versus 230 ms with totally ordered data. That workload was with 0.5% packet loss, 100 ms Round-Trip Time (RTT), and 100 Mbps bandwidth. We study the impact of varying the packet-loss rate, RTT, bandwidth, and CCA and demonstrate that partial order and FEC latency improvements grow as we increase packet loss and RTT, especially with the emerging Bottleneck Bandwidth and Round-Trip propagation time (BBR) congestion control algorithm.

Keywords:

latency; forward error correction; screen update; VNC; ordering; consistency fences

1. Introduction

For appropriate workloads, partially ordered message delivery can greatly reduce message latency. Remote desktops, video games, telemetry data, and logging information [1] are some examples that require reliable data delivery but do not require strict data (i.e., total) ordering. The pixels of one rectangle, or a specific data record, might require an atomic update, but different rectangles or different records can be updated in a partial order. Similar semantics have been noted and exploited for optimizations in related domains, such as append-only semantics for some file systems (e.g., the Google File System [2]).

The total ordering of message delivery is more common because total orders are easier to reason about. However, total ordering can result in additional overheads and higher latency when packets are lost. If there is flexibility in the ordering, then retransmissions of lost packets can be overlapped with regular transmissions, and unnecessary message blocking time can be avoided.

One general way to represent ordering constraints is as a directed acyclic graph (DAG) (Figure 1) [3]. For example, messages

M_{1}

,

M_{2}

, and

M_{3}

have no inter-message dependencies and can be delivered to the destination in any order (among those three messages), but

M_{9}

must be delivered before

M_{7}

. Directed edges show the ordering constraints. Figure 2 compares total order delivery (such as with the Transmission Control Protocol (TCP)) vs. partial order delivery for Figure 1 if message

M_{6}

is lost. With total order delivery, we cannot deliver

M_{7}

–

M_{9}

while we are waiting for the retransmission of

M_{6}

(the green block). This source of latency is known as the Head-Of-Line (HOL) blocking problem. However, with partial order delivery, we can first deliver

M_{7}

–

M_{9}

after

M_{5}

, with lower latency for those blocks, and overlap those deliveries with the retransmission of

M_{6}

, resulting in lower total latency for delivering

M_{1}

–

M_{11}

.

The challenge is designing a mechanism and semantics for specifying when ordering matters and when it does not matter. We introduced the concept of a consistency fence (CF) [4] as a mechanism to specify when ordering between packets and messages matter (inspired by memory fences from data consistency models [5]). In other words, CFs are a way of communicating ordering requirements between the sender and receiver. The sender inserts a CF to let the receiver know that all data before that CF need to be delivered before the data after that CF, but there is no ordering for the data between two CFs. In that work, we implemented the idea of CFs over a UDP-based Data Transfer (UDT) protocol [6] for partial order delivery and evaluated the latency improvements with and without an XOR-based Forward Error Correction (FEC) method for a synthetic workload.

In our new evalution, we use Quick UDP Internet Connections (QUIC) [7] because it supports a kind of partial ordering in which the data sent over each QUIC stream will be delivered in total order but involves no ordering among different streams. This way, QUIC can reduce the blocking time of different web resources of an HTTP/3 webpage by sending each resource over a separate stream. However, the download completion time of each resource still depends on the round-trip time (RTT) in the case of packet loss. In our previous work, we showed that integrating FEC for high-priority resources could even further reduce resource download completion time [8].

Some network workloads with partial ordering requirements can take advantage of running over QUIC if there are appropriate mechanisms and semantics. For example, remote desktop applications and online video games have periodic screen updates in which different parts or rectangles of the screen can be updated in any order as long as each screen update is finished before starting the next one. Over QUIC, different updates can be sent over different QUIC streams, but QUIC does not have any mechanism to make sure that we finish delivering all of the parts of one screen update before starting to deliver the next one.

In this paper, we extend and combine two of our previous contributions [4,8] to implement a partial ordering approach over QUIC with FEC to reduce the latency and blocking time for pixel-based screen sharing protocols such as Virtual Network Computing (VNC). VNC runs over TCP and suffers from the unnecessary added latency forced by total order delivery of TCP. We also include the OpenFEC library [9] in our implementation to have access to its FEC methods codes such as Reed–Solomon, although our focus is not on evaluating any particular FEC code.

In Section 2, we review some of the previous works related to supporting partial ordering and explain the background regarding our previous work on CFs and QUIC. In Section 3, we explain VNC, the FrameBuffer protocol, and the two synthetic workloads that we used for our evaluation. Section 4 includes the details of our implementation to add FEC and the partial ordering needed for our workloads over QUIC, while Section 5 explains our Emulab testbed setup used for our results in Section 6.

As discussed, our empirical results are based on synthetic workloads evaluated on the Emulab testbed. There are no widely accepted benchmarks for screen sharing, so we started with simpler tests that helped us isolate low-level effects (e.g., the different impacts of partial ordering and FEC on different congestion control algorithms). We speculated that real-world scenarios have more dynamic interactions due to contention between concurrent data streams, resulting in complicated packet-loss patterns. However, our empirical evaluation established a baseline for the benefit of integrating partial ordering and FEC, and future work will explore other workloads.

2. Background and Related Work

One way of communicating partial ordering requirements is using some notion of a dependency graph on the sender side and sending it to the receiver [1,3]. The receiver then considers that graph when delivering the data. Although a dependency graph could express the ordering requirements precisely, it also adds complexity, and the sender must have the dependency knowledge in advance.

Two other projects that support some notion of partial ordering and avoid total ordering of the TCP to solve the HOL blocking problem are the Stream Control Transmission Protocol (SCTP) [10] and QUIC [7]. Both the SCTP and QUIC use multi-streaming over a single connection to be able to relax the ordering of messages among different streams while supporting total ordering within each stream. However, not all workloads with partial ordering would fit in this scheme. For example, with the messages in Figure 1, using SCTP/QUIC does not allow us to send independent messages (e.g.,

M_{1}

,

M_{2}

, and

M_{3}

) over different streams to relax ordering, because then there would not be no choice for

M_{4}

, as it depends on both

M_{1}

and

M_{2}

. For example, if we send

M_{4}

on the same stream as

M_{1}

, then we can not make sure that its dependency on

M_{2}

will be satisfied, as there is no ordering among streams. Therefore, we can just use one stream for the whole DAG, which entails total ordering.

2.1. Consistency Fences for Partial Ordering

In our previous work [4], we proposed the consistency fence (CF) mechanism, in which the sender continues to transmit independent messages until there is a message that is dependent on the previously sent messages, so it inserts a fence and then sends that message. In other words, all the messages before each fence need to be delivered before any messages after the fence on the receiver side. The CF is analogous to the memory fence/barrier instructions to enforce an ordering constraint on memory operations before and after the fence used by CPUs [5].

In Figure 3, we show how the dependency graph in Figure 1 can be sent using CFs (e.g., vertical blue lines). The sender transmits the first three messages (

M_{1}

to

M_{3}

) and then inserts a fence because both

M_{4}

and

M_{5}

are dependent on at least one of the first three blocks, although there are no inter-dependencies between

M_{4}

and

M_{5}

themselves. The same would happen for the third and fourth groups of messages, which are

M_{6}

–

M_{8}

and

M_{9}

–

M_{11}

, respectively.

In our previous work, we performed the evaluation over a UDP-based Data Transfer (UDT) protocol [6] which is a user-level, reliable protocol designed for large data transfers over wide-area networks (WANs). We also implemented an XOR-based FEC method over the UDT protocol. In this paper, we replace UDT with QUIC as a more recent active transport layer over UDP and use the OpenFEC library [9] instead of a simple XOR FEC to use more efficient FEC methods such as Reed–Solomon. To implement CFs over QUIC for the screen updates workload, we send each update over a QUIC stream and handle the partial order delivery at the destination. We explain the details of our design and implementation in Section 4.

2.2. QUIC and FEC

QUIC replaced TCP as the underlying protocol for HTTP/3, which is the latest version of HTTP. QUIC keeps the useful features of TCP such as reliability, congestion, and flow control while also supporting some other features that can improve performance, such as the following:

Integrating TLS as a part of the protocol for security.
Multi-streaming of web resources to avoid the HOL blocking problem.
A 0-RTT connection setup for the clients, which previously connected to a server as apposed to TCP’s classic three-way handshake for any new connection.
User-space level implementations, which provide easier development and deployment compared to kernel code.

QUIC [7] supports multi-streaming, which allows us to define independent streams all running over a single QUIC connection. There are several QUIC implementations out there [11] which can differ slightly due to some details and support of different extensions. We used ngtcp2 [12]/nghttp3 [13] for this project because of three main reasons: (1) The code is readable and extensible, (2) its active forum helps us answering some of our questions quickly, and (3) it includes the Extensible Prioritization Scheme extension, which proved useful for turning on/off FEC for our experiments. The versions of ngtcp2/nghttp3 used for the results of this paper are ngtcp2 v1.0.1 and nghttp3 v1.0.0, which were downloaded in November 2023.

FEC is a well-known technique to provide reliability and improve latency in different network layers, including the transport layer. Using FEC, we can avoid the waiting time for receiving the retransmission of a lost packet/frame at the TCP layer [14,15]. Recently, QUIC [16] replaced the TCP for HTTP-based applications, and the possibility of taking advantage of FEC over QUIC is discussed in some papers that we focus on in this section. The first version of QUIC by Google [16] had FEC as a built-in option with a simple XOR-based method. However, FEC was dropped from QUIC due to overhead and its negative impact on performance [17].

There are some adaptive approaches to FEC in the literature to balance the overhead. For example, Garrido et al. [18] proposed the idea of setting the redundant rate adaptively based on the actual loss rate for QUIC using a simple XOR-based FEC method, which lacks the support of burst packet-loss patterns. Michel et al. [19] first studied different FEC methods such as XOR, Reed–Solomon (RS), and Random Linear Codes (RLCs) for file transfers. Their findings showed that applying FEC over QUIC can be beneficial for small files but is not very useful for larger bulk transfers, as they used a fixed rate of redundant data. Afterwards, they improved their work published in 2022 [20] by considering three use cases: (1) bulk data transfer, (2) file transfers with restricted buffers, and (3) delay-constrained messages with an adjustable reliability level of the transfer. Also, they helped minimize the overhead of FEC by letting the application set the level of redundancy.

Michel et al. proposed QUIRL [21], which adds a parameter called MaxJitter to FEC over QUIC. If this parameter is provided by the application to define the maximum affordable data delivery delay, then FEC can be applied only on the necessary portion of the data to reduce the FEC overhead. Also, QUIRL adds a new ACK frame called SOURCE_SYMBOL_ACK to inform the sender that a lost frame has been recovered by FEC to avoid sending its retransmission. The new ACK frame adds potential complexity to QUIRL’s integration with other parts of the CCA and QUIC protocol.

Holzinger et al. [22] proposed adding two additional parameters called burst loss tolerance (BLT) and repair delay tolerance (RDT) that can be set by the application to help perform a better weighted scheduling. Then, they proposed a new data scheduling algorithm based on resource priorities, and these new parameters were combined with FEC to reduce the completion time of web resources.

To the best of our knowledge, none of the previous studies considered a partial order delivery approach combined with FEC over QUIC to improve latency.

3. Use Case: Pixel-Based Screen Sharing Protocols

In this paper, we discuss the benefit of applying our partial delivery mechanism combined with FEC for a specific workload called pixel-based screen sharing protocols, such as for VNC [23]. VNC is an open-source application in which a VNC server can share its screen with a VNC client/viewer by sending screen updates, while the client has control over mouse and keyboard inputs and can view some applications running on the server node. VNC mostly follows the FrameBuffer update protocol [24] for screen updates which contain the pixel data of updated parts of the screen.

Each FrameBuffer update is a set of rectangles of the updated parts of the screen. VNC server sends a sequence of these updates to the client. The order of rectangles in each update is not important, but we need to keep the order of updates in the sequence in which they have been sent. That is why this kind of screen sharing protocol has partial ordering requirements, not total ordering. For example, if we have a sequence of five FrameBuffer updates (i.e, update1 to update5) and each contains 100 rectangles, we can deliver the 100 rectangles of update1 in arrival order at the client side and then start delivering the 100 rectangles of update2 in arrival order. We may receive some of the rectangles of update2 while we wait to finish delivering update1, so we need some buffer and a consistency fence between each consecutive update.

VNC runs over the TCP, as it needs a reliable transport protocol to run over. The TCP enforces total ordering, while VNC needs partial ordering. That means that running VNC over the TCP does not allow us to deliver rectangles of each update in arrival order. For example, if a rectangle in the middle of an update was lost, we need to wait for its retransmission to be able to deliver the rest of rectangles of that update. This waiting time is a function of the RTT, and the user would notice it as a lag in the middle of the screen update, especially for large RTT scenarios. To improve the smoothness of delivering the screen update, we propose replacing the underlying protocol (i.e., the TCP) with a protocol which allows us to relax the order of rectangles within each update (e.g., a UDP-based protocol) while keeping the order of updates in the sequence by using the CF model. Also, adding FEC helps us to recover lost packets sooner than waiting for the retransmissions and be able to finish delivering each update to start the next update sooner. We implemented this idea over QUIC as a UDP-based protocol.

The FrameBuffer update protocol is demand-driven, which means that the server sends a FrameBuffer update only after receiving an update request from the client. Although this design decision makes the protocol simple and addaptive to the client’s needs and network speed, it also means that the rate of screen updates can be one update per RTT at most. Therefore, some high-performance VNC implementations, such as TigerVNC [25], add a new feature to support continuous updates, which means that the server sends as many screen updates as possible based on the network capacity to avoid congestion but with a higher rate than the original demand-driven version. This new feature is useful for workloads with high update rates such as playing videos.

There is a wide range of use cases over VNC, from interactive examples such as 3D CAD, photo/video editing, and web browsing to less interactive ones such as watching a video. The performance metric of these different use cases can be different. For example, response time is an important metric for interactive use cases, while for streaming video, our goal is reducing glitches and improving smoothness. We designed two kinds of synthetic workloads to discuss the benefit of our approach over QUIC for these two ends of the spectrum:

Streaming updates: The client sends one request to receive a stream of screen updates from the server (Section 6.1).
Non-streaming updates: The client sends a request for the first update, then finishes downloading it, and after a while (e.g., 0.4 s) sends the request for the second update, and so on (Section 6.2).

Real workloads over VNC can be a combination of these two specific use cases. For example, we may use a mouse and click on a link or a file to watch a video and then use the mouse to open a window for editing a text with the keyboard.

We need to deliver the screen updates in the same order that they have been sent from the server. In other words, there is a consistency fence between every two consecutive screen updates. However, the rectangles within each update can be delivered in arrival order. In Section 6, we evaluate and compare the two synthetic workloads mentioned above using total ordering, partial ordering, and partial ordering+FEC over QUIC.

4. Design and Implementation

We used ngtcp2 [12] in our benchmaking results, as it is one of the active, well-known, open-source implementations of QUIC in C++. Ngtcp2 has an active GitHub project and a forum for answering questions quickly. In addition, several research articles used ngtcp2 to benchmark the QUIC protocol [26,27,28]. Our implementation has two parts: (1) partial ordering for screen updates over QUIC and (2) adding FEC to ngtcp2. We explain each part in the next two sections.

4.1. Partial Ordering for Screen Updates over QUIC

We want to relax ordering for the messages of each update but keep the order of the updates as they have been sent. The original QUIC cannot help with the partial ordering requirements of screen updates. The reason is that to relax the ordering of messages within each update, we need to send them over different QUIC streams. However, since there is no ordering among streams, we cannot continue the same strategy for the messages of the next update and make sure that, for example, we start delivering the messages of the second update after completely delivering the messages of the first update.

We implemented the CF model over QUIC by sending each update over a separate QUIC stream. Then, we obtained a buffer at the client side to store all the messages from the next updates/streams. Also, we used the new user-level callback that we added to QUIC to send the frames in arrival order to the HTTP layer at the client side so we could relax the order of messages in each update and be able to apply FEC, discussed in the next section, to reduce the completion time of each update and start the next update sooner.

4.2. Adding FEC to ngtcp2 [8]

In our previous paper on applying FEC for HTTP resources selectively over QUIC to reduce page-load time [8], we described our implementation of adding the support of an open-source FEC library called OpenFEC [9] over ngtcp2/nghttp3. In this paper, we can take advantage of our previous implementation for screen update workload as well, because we send each screen update over a QUIC stream. Therefore, each update can be seen as a resource that we can apply FEC on. Here, we briefly explain our implementation of adding FEC over QUIC.

We implemented both the encoding and decoding procedures at the HTTP layer, within the nghttp3 library [13], to minimize the modification to QUIC as much as possible. We imported the OpenFEC library into nghttp3 to use its encoding/decoding functions. The OpenFEC library supports three erasure codes: (1) two-dimensional XOR matrix, (2) Reed–Solomon (RS), and (3) Low-Density Parity Check (LDPC). For each code, there are three main parameters: (1) k is the number of original symbols, (2) n is the summation of the number of original and redundant symbols, and (3) the symbol size. In other words,

(n - k) \times SYMBOL_SIZE

is the extra overhead of FEC or redundant data that need to be sent along with the original data to be used for loss recovery at the destination.

In this paper, we used RS for all of our experiments. The OpenFEC library developers recommended RS when n is less than 255 and LDPC for larger values of n. We tested

k = 100

and

n = {103, 105, 110}

for different redundancy rates based on the packet-loss rates, but tuning for other values of n and k based on dynamic packet-loss monitoring and an extensive comparison of different FEC methods such as RS and LDPC remains as future work.

Fortunately, ngtcp2 supports the Extensible Prioritization Scheme [29], so the client can set a parameter called u for the urgency of each resource.

u = 0

means the highest urgency, and

u = 7

means the lowest urgency. In our implementation, if a resource receives a value less than 3 for u, we apply FEC to that as a high-priority resource, for 3 and larger values, we do not apply FEC. In this way, we can enable and disable FEC for the partial delivery mode and study the impact of this through the evaluation discussed in Section 6. Figure 4 shows an example of running the HTTP/3 client of ngtcp2 to request two updates with enabled FEC (U1 and U2).

For the scenario in Figure 4, the server would know that

U 1

and

U 2

need encoding based on their u values. The server calls the encoder and sends redundant frames after the original frames/messages. At the client side, we need to process the frames in arrival order to be able to apply decoding to recover any lost original frames. However, unmodified QUIC (as a reliable protocol) only sends the data in total order to the HTTP client within each stream. Therefore, the only part that we modified in QUIC layer implementation of ngtcp2 was adding a new user-level callback to be able to receive the QUIC frames in arrival order at the HTTP layer so we could try decoding to recover lost frames and not wait for the retransmissions at the end of the download. Whenever the decoder finishes recovering the whole update, the client can use ngtcp2_conn_shutdown_stream_read() to inform the server to stop sending.

There can be some lost frames in the middle of the transfer, for which we have enough time to receive their retransmissions by the end of the download of the update, and there is no need for FEC recovery. However, the lost frames in the last cwnd could benefit from FEC recovery and reduce download completion time by an RTT. Therefore, our implementation did not modify the congestion control algorithm nor had an impact on the fairness and also the retransmission process of the QUIC layer. As a result, our approach avoided possible performance problems or adding complexity to the current network layers, as also discussed in [30].

5. Evaluation Setup on Emulab

We used Emulab testbed [31,32] for all of our experiments. To create an Emulab experiment, one needs to create an account and write a profile to describe the specification of the topology and settings. Our Emulab profile used for all of our experiments is available on GitHub [33]. Figure 5 shows the topology used in our experiments, with two nodes as the sender/server and the reciever/client and an Emulab bridge node (in the middle) which runs DummyNet [34]. DummyNet is a FreeBSD system facility that emulates network delay, packet-loss rate, and bandwidth. We ran a provided configuration script called delay_config on the bridge node to set the network parameters.

For example, we used the command in Figure 6 to set 100 ms RTT (i.e., DELAY = 100), 2% loss rate (i.e., PLR = 0.01), and 500 Mbps (i.e., BANDWIDTH = 500,000) bandwidth on the bridge node.

We also set the DummyNet queue size to 9 MB (i.e., LIMIT = 9,000,000) to have enough buffer based on the Bandwidth-Delay Product (BDP) of the link, as recommended [35]. One important point is that we cannot set a limit higher than

1048576 B

as the default maximum limit of DummyNet. Therefore, we needed to run a command to increase that maximum threshold first (e.g., 9 MB in Figure 7).

For the client and server nodes, we used the public Ubuntu 22.04 image with 5.15.0-86-generic as the kernel version on two d430 nodes of Emulab. These nodes have two 2.4 GHz 64-bit 8-Core Xeon E5-2630v3 processors, 20 MB cache, 64 GB RAM, and 10 GbE NICs. We implemented the FEC encoding and decoding part in the HTTP layer (i.e., nghttp3), and we used different functions of OpenFEC library in our implementation. Therefore, we needed to modify the configure command of nghttp3 to use OpenFEC as a library in the code. Figure 8 shows the modified configure command.

In our experiments, we set n to

{103, 105, 110}

, k to 100, and the symbol size was the update size divided by k.

6. Empirical Results on Emulab

The workloads that we evaluated with the CF model have been described in the previous section. We had a sequence of N updates, i.e., update1, update2, …, updateN. Each update had a set of messages (i.e., rectangles for VNC). We needed to keep the order of updates so we delivered update1 completely, then started delivering update2, and so on. Therefore, we needed a CF between each two updates. Also, the delivery order of messages within each update was not considered important. We evaluated the latency of our CF implementation (i.e., partial ordering) with and without FEC over QUIC compared to the total ordering in an emulated testbed on Emulab.

For the workload, we sent 300 updates, with 100 messages of size 1400 bytes per update, for a total of 140 KB per update and 42 MB for all 300 updates. We set the loss rate to

{0 %, 0.1 %, 0.5 %, 1 %}

, the RTT to

{50 ms, 100 ms, 200 ms}

, the bandwidth to

{10 Mbps, 100 Mbps, 1 Gbps}

, and the CCA to

{B B R, C U B I C}

. We summarize these evaluation parameters in Table 1.

As discussed earlier, our focus is not on exploring different FEC codes nor on sophisticated ways to adjust the amount of redundant data used. However, we can quantify the overheads of FEC as follows: For n/k = 1.03 (details below), each 140 KB update requires 4.2 KB (i.e., 3%) of overhead. For n/k = 1.05 and n/k = 1.1, the overheads are 5% and 10%, respectively, as expected based on the n/k ratio. Also, when the packet-loss rate is 0%, these redundant data are pure overhead, since no packets are ever lost.

We chose a baseline setting to be able to compare and study the impact of different parameters in Table 1 separately. The baseline setting was a 0.5% packet-loss rate, 100 ms RTT, 100 Mbps bandwidth, and BBR. Therefore, to study the impact of each parameter, we kept all the other parameters the same as the baseline setting and only varied that specific one.

Although the workload used is not a full-fledged application, the overall results of this evaluation show the following:

Partial+FEC reduced message-blocking and message inter-delivery time for the streaming updates workload compared to total and partial delivery (Section 6.1). For example, the 99-percentile of message-blocking time for partial+FEC (n/k = 1.05) was 0.4 ms, while the 80-percentile of total was already 101 ms, and the 99-percentile was 230 ms (baseline setting: 0.5% loss, 100 ms RTT, 100 Mbps, BBR). See Figure 9.
Partial+FEC reduced the update response time, message inter-delivery time, and the completion time for the non-streaming updates workload compared to total and partial delivery (Section 6.2). For example, the 99-percentile of update response time for partial+FEC (n/k = 1.05) was 162 ms, while the 99-percentile of total was 325 ms. Also, the completion time of partial+FEC (n/k = 1.05) was 137 v.s. 149 s for total (baseline setting: 0.5% loss, 100 ms RTT, 100 Mbps, BBR). See Figure 18.
The latency improvements of Partial+FEC grew as a function of the packet-loss rate for both the streaming and non-streaming workloads. Higher packet-loss rates mean waiting for retrasnmissions more frequently with total delivery, which hurts the latency metrics. For example, with a 1% packet-loss rate, which is double the baseline loss and streaming workload, the 99-percentile of the message-blocking time for partial+FEC (n/k = 1.1) was 0.4 ms, while the 70-percentile of total was already 99 ms, and the 99-percentile was 264 ms (Figure 11).
The latency improvements of Partial+FEC grew as a function of the RTT of the network for both the streaming and non-streaming workloads. A higher RTT means lingering waiting time for each retransmission with total delivery, while using FEC we can let us avoid it. For example, with a 200 ms RTT, which is double the baseline RTT and streaming workload, the 99-percentile of the message-blocking time for partial+FEC (n/k = 1.1) was 0.4 ms, while the 80-percentile of total was already 225 ms, and the 99-percentile was 429 ms (Figure 13).

We evaluated two metrics for each workload and present them in the next two sections. In Section 6.1, we measured the message-blocking times and message inter-delivery times for the streaming updates workload. The message-blocking time is the time from receiving a message to its delivery time, during which it needs to be blocked in the client’s buffer. The message inter-delivery time is the time between delivering every two consecutive messages. Each noticeable message inter-delivery time can have a negative impact on the user experience as a glitch in that screen update.

In each figure, we compare four delivery options: (1) total signifies delivering the messages in the exact order that they have been sent, (2) partial signifies delivering the messages within each update in arrival order but keeping the sequence of updates as they have been sent, and (3) and (4) denote Partial+FEC with two different redundancy rates—the same as partial but using FEC to recover lost messages sooner.

In Section 6.2, we measured the update response times and message inter-delivery times for the non-streaming updates workload. Response time was introduced in [36] as an important metric for interactive performance over VNC. The response time is the time from getting an input (e.g., mouse and keyboard) from the client to the time we finish downloading the update.

6.1. Streaming Updates

The streaming updates workload is sending all updates (e.g., 300 updates in our experiments) without any gap after receiving a request from the client such as the frames of a video. The second workload of non-streaming updates will be discussed in Section 6.2. In Figure 9, we present the Cumulative Distribution Function (CDF) of the message-blocking time in Figure 9a and the scatter plot of the message inter-delivery time in Figure 9b for the baseline setting, which is a 0.5% packet-loss rate, 100 ms RTT, 100 Mbps bandwidth, and BBR. The vertical lines in the message inter-delivery time in Figure 9b show the completion time of each experiment to deliver all 300 updates. We also mention the exact completion times in the caption in addition to the number of points ar 100 ms (i.e., the RTT) for each delivery method. We present the same results for varying the packet-loss rate in Figure 10 and Figure 11, varying the RTT in Figure 12 and Figure 13, varying the bandwidth in Figure 14 and Figure 15, and varying the CCA in Figure 16. We discuss all of these results in the Section “Discussion of the Results of the Streaming Updates Workload”.

Discussion of the Results of the Streaming Updates Workload

Partial order+FEC reduced the message-blocking time and inter-delivery time with the baseline setting (Figure 9):
In Figure 9a, we show the CDF of the message-blocking time, in which using partial+FEC with both a 5% redundancy rate (i.e., red, n/k = 1.05) and 10% (i.e., black, n/k = 1.1) yielded a 99-percentile time of 0.4 ms. However, the 80-percentile of total (i.e., blue) was already 101.08 ms, and the 99-percentile was 230 ms.
Also, in Figure 9b, the total and partial delivery methods had higher message inter-delivery times compared to partial+FEC because they needed to wait for at least an RTT to receive the retransmissions. The 99.95-percentile of the message inter-delivery time was 135 ms for blue, 102 ms for gold, 6 ms for red, and 9 ms for black.
Partial order+FEC improvements increased as the packet-loss rate and RTT grew (Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13):
Increasing packet-loss from 0.1% shown in Figure 10 to 0.5% shown in Figure 9 to 1% shown in Figure 11 show that partial+FEC can result in close to zero message-blocking and message inter-delivery time results even for a 1% packet-loss rate while the latency results became worse for total and partial delivery (i.e., blue and gold). For example, the 70-percentile of the message-blocking time for total shown in Figure 11 was 99.14 ms.
Also, as we increased RTT from 50 ms shown in Figure 12 to 100 ms shown in Figure 9 to 200 ms shown in Figure 13, partial+FEC could keep the latency numbers close to zero independent of the RTT, while the message-blocking and message inter-delivery times of total and partial grew as a function of the RTT. For example, the 70-percentile of the message-blocking time for total shown in Figure 13 was 191.88 ms.
Partial order+FEC improvements remained across different bandwidth values (Figure 9, Figure 14 and Figure 15):
Partial+FEC could help reduce both the message-blocking and message inter-delivery times for all three different values of the bandwidth in our experiments, from 10 Mbps shown in Figure 14, to 100 Mbps shown in Figure 9, to 1 Gbps shown in Figure 15. For example, the number of messages with an inter-delivery time higher than 100 ms was 71, 59, and 46 for total v.s. 1 or less for partial+FEC with 10 Mbps shown in Figure 14b, 100 Mbps shown in Figure 9b, and 1 Gbps shown in Figure 15b, respectively.
Partial order+FEC reduced latency with both BBR and CUBIC CCA (Figure 9 and Figure 16):
Figure 16 shows the message-blocking and message inter-delivery times for CUBIC CCA. Considering the results with BBR and the same other network parameters (i.e., loss, RTT, and bandwidth) shown in Figure 9, we can see the increase in the message inter-delivery time of partial+FEC shown in Figure 16, but these are still better results compared to total and partial delivery. The reason is that we had a smaller congestion window with CUBIC compared to BBR in the case of the packet-loss rate (i.e., 0.5%), so we needed more time to receive the redundant packets at the end of each update to be able to recover lost packets [8].
The results for the packet-loss rate to set the redundancy rate of partial order+FEC (Figure 17):
The amount of redundancy rate depends on the packet-loss rate. We evaluated three values for redundancy rate in our experiments: 1.05 and 1.1 for all of the results except for the 0.1% packet-loss rate shown in Figure 10, with n/k = 1.03 and n/k = 1.05 rates (where n/k = 1.03 is different from the other experiments). Although a high redundancy rate such as 1.1 (i.e., black) yielded close to a 0 message-blocking time for all of the experiments, the redundancy rate of 1.05 (i.e., red) gave the same improvement for the message inter-delivery time as the 1.1 rate and also gave a lower completion time because of lower overhead. Figure 17 shows the FEC overhead with the n/k = 1.05 and n/k = 1.1 rates in terms of the message-blocking and message inter-delivery times when we set the packet-loss rate to 0%. The measured overhead for the message inter-delivery time is less than 10 ms, as seen in Figure 17b.

As a summary of the results for the streaming workload and partial order+FEC could help the reduce message blocking and inter-delivery time for a wide range of networks parameters compared to total ordering. For the baseline setting, we could reduce the 99-percentile of the message-blocking time by 230 ms compared to total delivery, as seen in Figure 9 (0.4 ms v.s. 230 ms). These improvements grew as we increased the packet-loss rate and RTT, and we also we obtained better improvement for BBR v.s. CUBIC CCA.

6.2. Non-Streaming Updates

The non-streaming updates workload is sending each update after receiving a request from the client. These requests can be any input from the client such as a mouse click or pressing a key of a keyboard. We set the gap between every two inputs to 300 ms in our experiments based on an example explained in [36] for the sequence of user inputs.

In Figure 18, we present the CDF of the update response time in Figure 18a and the scatter plot of the message inter-delivery time in Figure 18b for the baseline setting, which as 0.5% packet loss, 100 ms RTT, 100 Mbps bandwidth, and BBR. The vertical lines in the message inter-delivery time shown in Figure 9b show the completion time of each experiment to deliver all 300 updates. We also mention the exact completion times in the caption, in addition to the number of points above the RTT for each delivery method. We present the same results for varying the packet-loss rate in Figure 19 and Figure 20, varying the RTT in Figure 21 and Figure 22, varying the bandwidth in Figure 23 and Figure 24, and varying the CCA in Figure 25. We discuss all of these results in the Section “Discussion of the Results of the Non-Streaming Updates Workload”.

Discussion of the Results of the Non-Streaming Updates Workload

Partial order+FEC reduced the message-blocking time and inter-delivery time with the baseline setting (Figure 18):
The CDF of the update response time shown in Figure 18a shows that using partial+FEC with 5% redundancy rate (i.e., red, n/k = 1.05), the 99-percentile was 161.82 ms. However, the 99-percentile of total (i.e., blue) was 325.26 ms. Increasing the redundancy rate to 10% (i.e., black, n/k = 1.1) did not make a significant change, as 5% was enough for loss recovery.
As shown in Figure 18b, the 99.95-percentile of the message inter-delivery time was 113.5 ms for blue, 97.91 ms for gold, 1.47 ms for red, and 1.79 ms for black. Furthermore, partial+FEC with 5% redundancy completion time was 11 s less than total.
Partial order+FEC improvements increased as the packet-loss rate and RTT grew (Figure 18, Figure 19, Figure 20, Figure 21 and Figure 22):
Increasing packet-loss rate from 0.1% shown in Figure 19 to 0.5% shown in Figure 18 to 1% shown in Figure 20 showed greater improvements in partial+FEC for both the update response and message inter-delivery times. For example, as shown in Figure 20, the 99-percentile of the update response time was 164.19 ms for partial+FEC with 10% redundancy rate (i.e., black, n/k = 1.1), 316.94 ms for Partial+FEC (n/k = 1.05, red), 318.02 ms for partial (i.e., gold), and 319.88 ms for total (i.e., blue). The 5% redundancy rate of Partial+FEC (n/k = 1.05) caused a lower recovery rate compared to Partial+FEC (n/k = 1.1) with a high packet-loss rate of 1%.
Also, as we increased RTT from 50 ms shown in Figure 21 to 100 ms shown in Figure 18 to 200 ms shown in Figure 22, partial+FEC could reduce the update response time and message inter-delivery time as a function of the RTT compared to total and partial delivery. For example, as shown in Figure 22, the 99-percentile of the update response time was 272 ms for both partial+FEC (n/k = 1.05) and partial+FEC (n/k = 1.1), while it was 422 ms for total delivery.
Partial order+FEC improvements remained across different bandwidth values (Figure 18, Figure 23, and Figure 24):
Partial+FEC could help reduce both the update response and message inter-delivery time for all three different values of bandwidth in our experiments, from 10 Mbps shown in Figure 23, to 100 Mbps shown in Figure 18, to 1 Gbps shown in Figure 24. For example, the number of messages with an inter-delivery time higher than 100 ms was 79, 61, and 72 for total v.s. 2 or less for partial+FEC with 10 Mbps shown in Figure 14b, 100 Mbps shown in Figure 9b, and 1 Gbps shown in Figure 15b, respectively.
Partial order+FEC reduced latency with both BBR and CUBIC CCA (Figure 18 and Figure 25):
Figure 25 shows the update response and message inter-delivery time for CUBIC CCA. Considering the results with BBR and the same other network parameters (i.e., loss, RTT, and bandwidth) in Figure 18, we can see the increase in the message inter-delivery time of partial+FEC in Figure 25 but still with better results compared to total and partial delivery. For example, the number of points above 100 ms is 68 for blue, 13 for gold, 0 for red, and 0 for black in Figure 25b.
FEC Overhead:
Figure 26 shows the FEC overhead with n/k = 1.05 and n/k = 1.1 rates in terms of update response and message inter-delivery time as we set the packet-loss rate to 0%. Partial+FEC had no considerable negative impact on latency, especially with n/k = 1.05. Comparing Figure 26b and Figure 17b shows that the FEC overhead had less impact on the latency of the non-streaming workload. The reason is that since there was a 300 ms gap between every two consecutive updates in non-streaming, the overhead of FEC for each update did not have an impact on the next update.

As a summary of the results for the non-streaming workload, partial order+FEC could help reduce update response time, message inter-delivery time, and the completion time for a wide range of networks parameters compared to total ordering. For the baseline setting, we could reduce the 99-percentile of the update response time by 50% compared to total delivery, as shown in Figure 18 (162 ms v.s. 325 ms). These improvements grew as we increased the packet-loss rate and RTT, and we also obtained better improvement for BBR v.s. CUBIC CCA.

7. Concluding Remarks

In this paper, we described the design and implementation of a partial-order delivery system over QUIC, for workloads with a sequence of updates, which contain messages without strict ordering dependencies. For example, a typical VNC runs over the TCP, whose totally ordered delivery adds unnecessary blocking time. Unmodified QUIC supports a kind of partial ordering, based on streams, which does not fit the screen updates workload. Also, we combined our partial ordering design with FEC to further improve latency and reduce blocking time by not waiting proportionally to the RTT for retransmissions.

We used two different synthetic workloads: (1) Streaming updates: The sender sends a sequence of updates without gap such as the frames of a video. (2) Non-streaming updates: The sender sends each update after an input (request) from the client. Through our evaluation with a range of parameters (e.g., packet-loss rate, RTT, bandwidth, and CCA) on the Emulab testbed, we showed that our proposal achieved latency improvements compared to total delivery for both workloads.

For the baseline setting (0.5% loss, 100 ms RTT, 100 Mbps, BBR), with the streaming workload, we could reduce the 99-percentile of message-blocking time by 230 ms compared to total delivery (Figure 9) (0.4 ms v.s. 230 ms). And with the non-streaming workload, we could reduce the 99-percentile of update response time by 50% compared to total delivery (Figure 18) (162 ms v.s. 325 ms). These improvements grew as we increased the packet-loss rate and RTT.

We started with synthetic workloads to establish the basic benefits of adding partial ordering and FEC to QUIC for low-latency applications. Admittedly, real-world scenarios will be more complicated due to contention between multiple data streams and scalability issues. Shared networks, like the wider Internet outside of testbeds such as Emulab, have multiple users, multiple protocols in use (e.g., TCP, UDP, CUBIC CCA, and BBR CCA), and multiple applications.

One contribution of our empirical analysis is quantification of the benefits of partial ordering and FEC for both the CUBIC and BBR CCAs, with BBR benefiting even more than CUBIC from FEC. Given the fundamental differences (e.g., loss-based vs. measurement-/model-based designs) between CUBIC and BBR, it was not obvious beforehand what would be the benefit of our proposals. Furthermore, with the growth in the use of BBR due to Google’s engineering efforts, that result is useful for future QUIC-based use cases.

As well, real-world scenarios that push the scalability of networked applications (e.g., larger numbers of users, concurrent VNC sessions, and competing data streams) are of particular interest. Contention for networks, especially at bottleneck links, result in more packet loss, complicated bursty loss patterns, and greater chances of even retransmitted packets being lost. Our current results establish a baseline benefit from partial ordering and FEC across a range of network parameters (e.g., packet-loss rate, bottleneck bandwidth, CCA protocol, etc.), and we plan on future experiments with more dynamic and high-contention scenarios.

Author Contributions

Methodology, N.E.; software, N.E.; experiments, N.E.; writing—original draft preparation and editing, N.E.; Methodology, N.E.; writing—review and editing, P.L.; supervision, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by Dr. Paul Lu.

Data Availability Statement

The implementation and workloads supporting the conclusions of this article will be made available by the authors upon request.

Acknowledgments

We thank Emulab for generous access to their testbed.

Conflicts of Interest

The authors declare no conflict of interest.

References

Connolly, T.; Amer, P.; Conrad, P. An Extension to TCP: Partial Order Service; Internet RFC1693; 1994. Available online: https://datatracker.ietf.org/doc/html/rfc1693 (accessed on 18 June 2025).
McKusick, M.K.; Quinlan, S. GFS: Evolution on Fast-forward: A discussion between Kirk McKusick and Sean Quinlan about the origin and evolution of the Google File System. Queue 2009, 7, 10–20. [Google Scholar] [CrossRef]
Pooya, S.; Lu, P.; MacGregor, M.H. Structured Message Transport. In Proceedings of the 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC), Austin, TX, USA, 1–3 December 2012; pp. 432–439. [Google Scholar]
Eghbal, N.; Lu, P. Consistency Fences for Partial Order Delivery to Reduce Latency. In Proceedings of the International Conference on Computational Science, Omaha, NE, USA, 27–29 June 2022; Springer: Cham, Switzerland, 2022; pp. 488–501. [Google Scholar]
Mosberger, D. Memory consistency models. ACM SIGOPS Oper. Syst. Rev. 1993, 27, 18–26. [Google Scholar] [CrossRef]
Gu, Y.; Grossman, R.L. UDT: UDP-based data transfer for high-speed wide area networks. Comput. Netw. 2007, 51, 1777–1799. [Google Scholar] [CrossRef]
Langley, A.; Riddoch, A.; Wilk, A.; Vicente, A.; Krasic, C.; Zhang, D.; Yang, F.; Kouranov, F.; Swett, I.; Iyengar, J.; et al. The quic transport protocol: Design and internet-scale deployment. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, Los Angeles, CA, USA, 21–25 August 2017; pp. 183–196. [Google Scholar]
Eghbal, N.; Ciotto Pinton, G.; Muthuraj, N.; Lu, P. Quic-Sfec: Lower Latency Quic for Resource Dependencies Using Forward Error Correction. preprint. [CrossRef]
OpenFEC. 2025. Available online: http://openfec.org/ (accessed on 18 June 2025).
Stewart, R.; Metz, C. SCTP: New transport protocol for TCP/IP. IEEE Internet Comput. 2001, 5, 64–69. [Google Scholar] [CrossRef]
List of QUIC Implementation. 2025. Available online: https://github.com/quicwg/base-drafts/wiki/Implementations (accessed on 18 June 2025).
ngtcp2. ngtcp2 Project Is an Effort to Implement IETF QUIC Protocol. 2025. Available online: https://github.com/ngtcp2/ngtcp2 (accessed on 18 June 2025).
nghttp3. HTTP/3 Library Written in C. 2025. Available online: https://github.com/ngtcp2/nghttp3 (accessed on 18 June 2025).
Sundararajan, J.K.; Shah, D.; Médard, M.; Mitzenmacher, M.; Barros, J. Network coding meets TCP. In Proceedings of the IEEE INFOCOM 2009, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 280–288. [Google Scholar]
Kim, M.; Cloud, J.; ParandehGheibi, A.; Urbina, L.; Fouli, K.; Leith, D.; Médard, M. Network Coded tcp (ctcp). arXiv 2012, arXiv:1212.2291. [Google Scholar]
Hamilton, R.; Iyengar, J.; Swett, I.; Wilk, A. QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2. Internet-Draft draft-hamilton-early-deployment-quic-00; 2016. Available online: https://datatracker.ietf.org/doc/html/draft-tsvwg-quic-protocol-02 (accessed on 18 June 2025).
Kakhki, A.M.; Jero, S.; Choffnes, D.; Nita-Rotaru, C.; Mislove, A. Taking a long look at QUIC: An approach for rigorous evaluation of rapidly evolving transport protocols. In Proceedings of the 2017 Internet Measurement Conference, London, UK, 1–3 November 2017; pp. 290–303. [Google Scholar]
Garrido, P.; Sanchez, I.; Ferlin, S.; Aguero, R.; Alay, O. rQUIC: Integrating FEC with QUIC for robust wireless communications. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–7. [Google Scholar]
Michel, F.; De Coninck, Q.; Bonaventure, O. QUIC-FEC: Bringing the benefits of Forward Erasure Correction to QUIC. In Proceedings of the IEEE 2019 IFIP Networking Conference (IFIP Networking), Warsaw, Poland, 20–22 May 2019; pp. 1–9. [Google Scholar]
Michel, F.; Cohen, A.; Malak, D.; De Coninck, Q.; Médard, M.; Bonaventure, O. FlEC: Enhancing QUIC with application-tailored reliability mechanisms. IEEE/ACM Trans. Netw. 2022, 31, 606–619. [Google Scholar] [CrossRef]
Michel, F.; Bonaventure, O. QUIRL: Flexible QUIC Loss Recovery for Low Latency Applications. IEEE/ACM Trans. Netw. 2024, 32, 5204–5215. [Google Scholar] [CrossRef]
Holzinger, K.; Petri, D.; Lachnit, S.; Kempf, M.; Stubbe, H.; Gallenmüller, S.; Günther, S.; Carle, G. Forward Error Correction and Weighted Hierarchical Fair Multiplexing for HTTP/3 over QUIC. In Proceedings of the IEEE 2025 IFIP Networking Conference (IFIP Networking), Limassol, Cyprus, 26–29 May 2025; pp. 1–9. [Google Scholar]
Richardson, T.; Stafford-Fraser, Q.; Wood, K.R.; Hopper, A. Virtual network computing. IEEE Internet Comput. 1998, 2, 33–38. [Google Scholar] [CrossRef]
Richardson, T.; Wood, K.R. The rfb Protocol; ORL: Cambridge, UK, 1998. [Google Scholar]
TigerVNC. TigerVNC Project. Available online: https://www.tigervnc.org (accessed on 18 June 2025).
Marx, R.; De Decker, T.; Quax, P.; Lamotte, W. Resource Multiplexing and Prioritization in HTTP/2 over TCP Versus HTTP/3 over QUIC. In Proceedings of the Web Information Systems and Technologies: 15th International Conference, WEBIST 2019, Vienna, Austria, 18–20 September 2019; Revised Selected Papers 15; Springer: Cham, Switzerland, 2020; pp. 96–126. [Google Scholar]
Hasselquist, D.; Lindström, C.; Korzhitskii, N.; Carlsson, N.; Gurtov, A. Quic throughput and fairness over dual connectivity. Comput. Netw. 2022, 219, 109431. [Google Scholar] [CrossRef]
Endres, S.; Deutschmann, J.; Hielscher, K.S.; German, R. Performance of QUIC implementations over geostationary satellite links. arXiv 2022, arXiv:2202.08228. [Google Scholar]
Oku, K.; Pardue, L. Extensible Prioritization Scheme for HTTP; Work Progress, Internet-Draft. Draft-Ietfhttpbis; 2020; Volume 1. Available online: https://datatracker.ietf.org/doc/rfc9218/ (accessed on 18 June 2025).
Mittal, R.; Shpiner, A.; Panda, A.; Zahavi, E.; Krishnamurthy, A.; Ratnasamy, S.; Shenker, S. Revisiting network support for RDMA. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, Budapest, Hungary, 20–25 August 2018; pp. 313–326. [Google Scholar]
Emulab. 2025. Available online: https://www.emulab.net (accessed on 18 June 2025).
White, B.; Lepreau, J.; Stoller, L.; Ricci, R.; Guruprasad, S.; Newbold, M.; Hibler, M.; Barb, C.; Joglekar, A. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation, USENIX Association, Boston, MA, USA, 9–11 December 2002; pp. 255–270. [Google Scholar]
Eghbal, N. Emulab Profile. Available online: https://github.com/nEghbal/netem_dummynet/blob/main/profile.py (accessed on 18 June 2025).
Dummynet. 2025. Available online: https://man.freebsd.org/cgi/man.cgi?dummynet (accessed on 18 June 2025).
Sander, C.; Kunze, I.; Wehrle, K. Analyzing the Influence of Resource Prioritization on HTTP/3 HOL Blocking and Performance. In Proceedings of the 2022 Network Traffic Measurement and Analysis Conference (TMA), Enschede, The Netherlands, 27–30 June 2022; pp. 1–10. [Google Scholar]
Zeldovich, N.; Chandra, R. Interactive Performance Measurement with VNCPlay. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, Anaheim, CA, USA, 10–15 April 2005; pp. 189–198. [Google Scholar]

Figure 1. Dependency graph (based on Figure 1 in [4]).

Figure 2. Total vs. partial delivery (based on Figure 2 in [4]).

Figure 3. Consistency fences (based on Figure 3 in [4]).

Figure 4. Example ngtcp2 command line, client [8].

Figure 5. Experimental topology obtained from Emulab GUI [8].

Figure 6. Example configuration command line, bridge node: bandwidth, packet-loss rate, delay/latency, buffer units, buffer size.

Figure 7. Extend maximum limit of DummyNet.

Figure 8. nghttp3 build configuration: use OpenFEC library [8].

Figure 9. Partial+FEC reduced message-blocking and message inter-delivery time. Streaming updates, baseline (0.5% loss, 100 ms RTT, 100 Mbps, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 100 ms is 59 for blue, 18 for gold, 1 for red, and 1 for black. Blue completed at 15.23 s, gold at 15.58 s, red at 16.09 s, and black at 16.33 s.

Figure 10. Even for lower packet-loss rate than baseline, partial+FEC reduced message-blocking and message inter-delivery time. Streaming updates, 0.1% packet-loss rate (100 ms RTT, 100 Mbps, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 100 ms is 16 for blue, 6 for gold, 1 for red, and 1 for black. Blue completed at 15.45 s, gold at 16.01 s, red at 15.43 s, and black at 15.11 s.

Figure 11. Partial+FEC reduced message-blocking and message inter-delivery time even more with higher packet-loss rate than baseline. Streaming updates, 1% packet-loss rate (100 ms RTT, 100 Mbps, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 100 ms is 55 for blue, 28 for gold, 1 for red and 1 for black. Blue completed at 15.38 s, gold at 16.06 s, red at 15.19 s, and black at 17.21 s.

Figure 12. Even for lower RTT than baseline, partial+FEC reduced message-blocking and message inter-delivery time. Streaming updates, 50 ms RTT (0.5% loss, 100 Mbps, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 50 ms is 57 for blue, 28 for gold, 0 for red, and 1 for black. Blue completed at 16.02 s, gold at 18.09 s, red at 15.97 s, and black at 18.1 s.

Figure 13. Partial+FEC reduced message-blocking and message inter-delivery time even more with higher RTT than baseline. Streaming updates, 200 ms RTT (0.5% loss, 100 Mbps, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 200 ms is 34 for blue, 22 for gold, 2 for red, and 1 for black. Blue completed at 17.94 s, gold at 16.18 s, red at 15.48 s, and black at 17.44 s.

Figure 14. Even for lower bandwidth than baseline, partial+FEC reduced message-blocking and message inter-delivery time. Streaming updates, 10 Mbps (0.5% loss, 100 ms RTT, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 100 ms is 71 for blue, 13 for gold, 0 for red, and 0 for black. Blue completed at 40.77 s, gold at 41.39 s, red at 41.24 s, and black at 42.88 s.

Figure 15. Partial+FEC reduced message-blocking and message inter-delivery time for higher bandwidth than baseline too. Streaming updates, 1 Gbps (0.5% loss, 100 ms RTT, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 100 ms is 46 for blue, 19 for gold, 1 for red, and 1 for black. Blue completed at 15.45 s, gold at 16.04 s, red at 16.15 s, and black at 16.92 s.

Figure 16. Partial+FEC reduced message-blocking and message inter-delivery time for CUBIC CCA. Streaming updates, CUBIC (0.5% loss, 100 ms RTT, 100 Mbps): (a) CDF of message-blocking time. (b) Message inter-delivery time in which the number of points above 100 ms is 70 for blue, 40 for gold, 2 for red, and 3 for black. Blue completed at 66.98 s, gold at 68.12 s, red at 71.92 s, and black at 75.08 s.

Figure 17. The overhead of partial+FEC was less than 10 ms in terms of message inter-delivery time. Streaming updates, 0% packet-loss (100 ms RTT, 100 Mbps, BBR): (a) CDF of message-blocking time. (b) Message inter-delivery time. Blue completed at 15.47 s, gold at 16.38 s, red at 15.94 s, and black at 17.1 s.

Figure 18. Partial+FEC reduced update response time, message inter-delivery time, and the total completion time. Non-streaming updates, baseline (0.5% loss, 100 ms RTT, 100 Mbps, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 100 ms is 61 for blue, 7 for gold, 0 for red, and 1 for black. Blue completed at 148.8 s, gold at 150.61 s, red at 136.64 s, and black at 139.14 s.

Figure 19. Even for lower packet-loss rate than baseline, partial+FEC reduced update response time, message inter-delivery time, and the total completion time. Non-streaming updates, 0.1% packet-loss rate (100 ms RTT, 100 Mbps, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 100 ms is 14 for blue, 0 for gold, 1 for red, and 1 for black. Blue completed at 138.96 s, gold at 138.12 s, red at 134.51 s, and black at 135.61 s.

Figure 20. Partial+FEC reduced update response time and message inter-delivery time even more with higher packet-loss rate than baseline. Non-streaming updates, 1% packet-loss rate (100 ms RTT, 100 Mbps, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 100 ms is 119 for blue, 10 for gold, 1 for red, and 1 for black. Blue completed at 149.12 s, gold at 150.37 s, red at 137.64 s, and black at 135.77 s.

Figure 21. Even for lower RTT than baseline, partial+FEC reduced update response time, message inter-delivery time, and the total completion time. Non-streaming updates, 50 ms RTT (0.5% loss, 100 Mbps, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 50 ms is 70 for blue, 10 for gold, 1 for red, and 1 for black. Blue completed at 128.98 s, gold at 133.74 s, red at 123.46 s, and black at 122.56 s.

Figure 22. Partial+FEC reduced update response time and message inter-delivery time even more with higher RTT than baseline. Non-streaming updates, 200 ms RTT (0.5% loss, 100 Mbps, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 200 ms is 55 for blue, 9 for gold, 0 for red, and 0 for black. Blue completed at 180.24 s, gold at 185.66 s, red at 167.06 s, and black at 167.88 s.

Figure 23. Even for lower bandwidth than baseline, partial+FEC reduced update response time, message inter-delivery time, and the total completion time. Non-streaming updates, 10 Mbps bandwidth (0.5% loss, 100 ms RTT, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 100 ms is 79 for blue, 6 for gold, 0 for red, and 0 for black. Blue completed at 165.38 s, gold at 165.59 s, red at 158.97 s, and black at 158.2 s.

Figure 24. Partial+FEC reduced update response time and message inter-delivery time for higher bandwidth than baseline too. Non-streaming updates, 1 Gbps bandwidth (0.5% loss, 100 ms RTT, BBR): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 100 ms is 72 for blue, 5 for gold, 0 for red, and 2 for black. Blue completed at 146.51 s, gold at 146 s, red at 139.73 s, and black at 138.71 s.

Figure 25. Partial+FEC reduced update response time, message inter-delivery time, and the total completion time for CUBIC CCA. Non-streaming updates, CUBIC (0.5% loss, 100 ms RTT, 100 Mbps): (a) CDF of update response time. (b) Message inter-delivery time in which the number of points above 100 ms is 68 for blue, 13 for gold, 0 for red, and 0 for black. Blue completed at 159.68 s, gold at 157.12 s, red at 150.32 s, and black at 147.66 s.

Figure 26. Partial+FEC (n/k = 1.05) had no considerable negative impact on update response time and message inter-delivery time with 0% packet-loss rate. Non-streaming updates, 0% packet-loss rate (100 ms RTT, 100 Mbps, BBR): (a) CDF of update response time. (b) Message inter-delivery time.

Table 1. Experimental setup on Emulab. Baseline setting in bold.

Parameter	Value
Workload	streaming updates, non-streaming updates
Packet loss	0%, 0.1%, 0.5%, 1%
RTT	50 ms, 100 ms, 200 ms
Bandwidth	10 Mbps, 100 Mbps, 1 Gbps
CCA	BBR, CUBIC
QUIC frame size	1400 B

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Eghbal, N.; Lu, P. Lower-Latency Screen Updates over QUIC with Forward Error Correction. Future Internet 2025, 17, 297. https://doi.org/10.3390/fi17070297

AMA Style

Eghbal N, Lu P. Lower-Latency Screen Updates over QUIC with Forward Error Correction. Future Internet. 2025; 17(7):297. https://doi.org/10.3390/fi17070297

Chicago/Turabian Style

Eghbal, Nooshin, and Paul Lu. 2025. "Lower-Latency Screen Updates over QUIC with Forward Error Correction" Future Internet 17, no. 7: 297. https://doi.org/10.3390/fi17070297

APA Style

Eghbal, N., & Lu, P. (2025). Lower-Latency Screen Updates over QUIC with Forward Error Correction. Future Internet, 17(7), 297. https://doi.org/10.3390/fi17070297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lower-Latency Screen Updates over QUIC with Forward Error Correction

Abstract

1. Introduction

2. Background and Related Work

2.1. Consistency Fences for Partial Ordering

2.2. QUIC and FEC

3. Use Case: Pixel-Based Screen Sharing Protocols

4. Design and Implementation

4.1. Partial Ordering for Screen Updates over QUIC

4.2. Adding FEC to ngtcp2 [8]

5. Evaluation Setup on Emulab

6. Empirical Results on Emulab

6.1. Streaming Updates

Discussion of the Results of the Streaming Updates Workload

6.2. Non-Streaming Updates

Discussion of the Results of the Non-Streaming Updates Workload

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI