Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction

Woo, Jeonghun; Hong, Seungwoo; Kang, Donghyun; An, Donghyeok

doi:10.3390/app142210490

Open AccessArticle

Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction

¹

Department of Computer Engineering, Changwon National University, Changwon 51140, Republic of Korea

²

Network Research Department, Electronics and Telecommunications Research Institute (ETRI), Daejeon 34129, Republic of Korea

³

Department of Computer Engineering, College of IT Convergence, Gachon University, Seongnam-si 13120, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10490; https://doi.org/10.3390/app142210490

Submission received: 23 August 2024 / Revised: 8 November 2024 / Accepted: 13 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue Multimedia Systems Studies)

Download

Browse Figures

Versions Notes

Abstract

:

With the evolution of cellular networks and wireless-local-area-network-based communication technologies, services for smart device users have appeared. With the popularity of 4G and 5G, smart device users can now consume larger bandwidths than before. Consequently, the demand for various services, such as streaming, online games, and video conferences, has increased. For improved quality of experience (QoE), streaming services utilize adaptive bitrate (ABR) algorithms to handle network bandwidth variations. ABR algorithms use network bandwidth history for future network bandwidth prediction, allowing them to perform efficiently when network bandwidth fluctuations are minor. However, in environments with frequent network bandwidth changes, such as wireless networks, the QoE of video streaming often degrades because of inaccurate predictions of future network bandwidth. To address this issue, we utilize the gated recurrent unit, a time series prediction model, to predict the network bandwidth accurately. We then propose a buffer-based ABR streaming technique that selects optimized video-quality settings on the basis of the predicted bandwidth. The proposed algorithm was evaluated on a dataset provided by Zeondo by categorizing instances of user mobility into walking, bus, and train scenarios. The proposed algorithm improved the QoE by approximately 11% compared with the existing buffer-based ABR algorithm in various environments.

Keywords:

buffer-based ABR; quality of experience; bandwidth prediction; gated recurrent unit; video streaming

1. Introduction

The high penetration rate of smartphones and the development of 4G, 5G, and wireless local area network (WLAN)-based communication technologies have enabled users to experience high communication performance without restrictions on location or time. On the basis of these advancements in the network environment, demand for various services such as streaming, online games, and video conferencing has increased. In particular, the demand for video streaming is growing rapidly, leading to a significant increase in video streaming traffic within networks, as shown in Figure 1 [1]. Figure 1 depicts the traffic volume ratios for each app in 2022. Video traffic volume accounts for approximately 65% of the total. Consequently, the need for efficient transmission of media content has increased, prompting several studies to address this issue [2].

To provide stable and efficient video streaming, service providers split videos into short chunks, encode each chunk at different quality levels, and store them on the streaming server. When a user requests a streaming service, the provider uses an adaptive bitrate (ABR) algorithm to deliver the video according to the user’s network environment [3]. The ABR algorithm assesses the user’s network conditions to ensure continuous video playback. Based on this assessment, chunks of appropriate video quality are selected and transmitted.

Network performance needs to be measured to enable the ABR algorithm to select video chunks that match the network environment. An incorrect assessment of network performance may lead to the selection of high-quality video chunks that exceed the network’s actual bandwidth; consequently, rebuffering may occur, causing video playback to stop despite the high video quality. Conversely, if only low-quality video chunks are selected, despite the network being capable of supporting higher quality, the streaming quality will be low, reducing user satisfaction, even though rebuffering does not occur.

An ABR algorithm uses the network bandwidth observed while downloading the previous chunk to predict the bandwidth for downloading the next chunk. This approach works well in stable network environments, such as wired networks, where bandwidth remains consistent. However, in wireless environments, where network bandwidth can fluctuate significantly over time under varying channel conditions, this method reduces the accuracy of network performance prediction. When the accuracy of network performance prediction is low, it becomes challenging to provide optimal video streaming services, resulting in poor video quality despite buffering or frequent rebuffering.

The buffer-based ABR (BBA) algorithm [4] was proposed to address this issue. The BBA selects the video quality for the next chunk by considering both the current playback buffer occupancy and network environment, aiming to prevent rebuffering and provide optimal video quality under varying network conditions. If the buffer occupancy is low, then a lower video quality is selected to increase the buffer, whereas a higher video quality is chosen when the buffer occupancy is high. However, BBA determines the next chunk’s video quality based on the measured network bandwidth and buffer occupancy, and network bandwidth is measured based on the time taken to download the previous chunk and its size. Therefore, the algorithm may struggle to select the appropriate video quality if the network environment changes before the next chunk is downloaded. In wireless network environments, where communication conditions can fluctuate rapidly as a result of user mobility, BBA may select chunks with incorrect video quality, leading to rebuffering or the selection of low-quality chunks, thereby degrading the user’s quality of experience (QoE). Consequently, less rebuffering, less video rate variations, and high-rate video chunk selection are needed to increase QoE.

Several studies have been conducted to increase the QoE of video streaming service users. The network and buffer statuses were used to predict the bitrate of the following video chunk [4,5,6,7]. However, although these strategies are effective in a stable network environment, their performance may suffer when the network fluctuates dramatically. To deal with network bandwidth change, research has been performed to improve the user QoE of video streaming services using deep learning and reinforcement learning [8,9,10,11,12,13]. These studies, however, use reactive approaches that modify the bitrate in response to changes in the network state. Schemes that use mobile edge computing (MEC) have also been proposed to increase the real-time performance of video streaming [14,15,16,17]. However, to use MEC, the existing streaming service protocol must be modified.

This study proposes a gated recurrent unit (GRU)-based network performance prediction model and a buffer-based ABR algorithm that leverages the resulting predictions for efficient streaming service provisions, even in environments with significant network fluctuations. First, we use the GRU, which is a time-series model, to predict network performance at the time of the next chunk download, improving the accuracy of network performance predictions for video streaming users in wireless environments. The GRU, a type of recurrent neural network (RNN), is designed to solve the long-term dependency problem of long short-term memory (LSTM), while being more lightweight by reducing the computations required to update the hidden state [18]. The GRU demonstrated a quicker learning curve and better prediction performance than LSTM in [19]. The main components of the GRU are the update gate and reset gate, which combine input data and the previous hidden state to create a new hidden state. This model efficiently learns from long sequence data. Second, by selecting chunks with appropriate video quality based on the predicted network bandwidth and current playback buffer occupancy, we improve the user QoE by minimizing rebuffering and unnecessary video quality changes compared with the existing BBA. The proposed scheme was evaluated using the throughput of an actual long-term evaluation (LTE) network to evaluate realistic performance and reduce dataset dependencies. The QoE of user streaming services was measured in various scenarios, including walking, riding a bus, and traveling by train, indicating an average improvement of 11% compared with the existing BBA. In summary, the main contributions of this paper are as follows:

We propose a GRU based network bandwidth prediction model.
We select the rate of the next video chunk based on the predicted network bandwidth and buffer occupancy.
We evaluate and compare the QoE of the proposed scheme. Our scheme outperforms BBA in QoE by approximately 11%.

The remainder of this paper is organized as follows: Section 2 reviews studies and background knowledge relevant to this research. Section 3 discusses related works. Section 4 introduces the GRU-based network performance prediction method and video quality selection algorithm based on playback buffer occupancy. Section 5 presents performance evaluation and discusses the results. Finally, Section 6 concludes this paper and outlines directions for future research.

2. Background

This section describes the basic operating principles of the ABR algorithm and BBA.

2.1. ABR Algorithm

ABR algorithms continuously monitor changes in the user environment, such as network bandwidth and CPU usage, and transmit video content at an appropriate quality based on these factors [3]. Initially, when downloading a chunk encoded at the selected video quality, the network bandwidth is measured based on the download time and size of the video chunk. After downloading, the video chunk is stored in the playback buffer, and the videos stored in the buffer are played sequentially. The next chunk is downloaded when the size or playback time of the remaining video in the playback buffer falls below a certain threshold. The video quality of the next chunk is then selected based on the network bandwidth measured from the previous download. However, if the network environment changes suddenly, causing a significant drop in bandwidth, rebuffering can occur even if the previous network bandwidth was high, leading to all video chunks in the buffer being played.

2.2. BBA

BBA is a derivative of the ABR algorithm designed on the basis of the correlation between buffer occupancy and the quality of the video to be downloaded next [4]. This algorithm uses the network bandwidth estimated from downloading the previous chunk and the buffer occupancy (the ratio of images currently stored in the buffer) as inputs. Figure 2 illustrates the method used for calculating buffer occupancy in BBA. The total playtimes of the chunks stored in the buffer are converted into seconds, and the buffer occupancy is calculated using Equation (1). The play time of the video chunks currently stored in the buffer is divided by the total play time of the entire buffer.

B (t) = \frac{play time of stored video}{total play time of buffer}

(1)

As shown in Figure 3, BBA selects video quality based on C’(t), the predicted network bandwidth, and B(t), the buffer occupancy. The network bandwidth measured during the download of the previous chunk is used as C’(t). The buffer occupancy B(t) calculated using Equation (1) is not directly used to select the video quality of the next chunk, but is adjusted using the adjustment coefficient function F(•). The BBA operates on the basis of these two values. Even if the predicted network bandwidth is low, indicating that the next chunk may take a long time to download, high video quality is selected if the buffer occupancy is high. Conversely, if the buffer occupancy is low, then low video quality is chosen, even if the predicted network bandwidth is high, to account for potential delays in video downloading. Consequently, BBA improves the user’s QoE by reducing the rebuffering rate by 10% to 20% compared with the ABR algorithm.

However, BBA is unsuitable for wireless network environments such as 4G, 5G, and wireless LANs. These environments change rapidly as a result of user mobility and frequent variations in the surrounding environment. For example, if a user is streaming videos when moving on foot, by bus, or by train, the network bandwidth fluctuates because the wireless communication environment between the base station and the user changes under the influence of factors such as communication distance and obstacles. BBA struggles to respond to these changes appropriately, as it assumes that the network conditions during the previous chunk download remain constant. This can lead to issues such as rebuffering or the selection of low video quality. To address these challenges, we use deep learning-based network performance prediction to adapt to rapid changes in wireless network environments.

3. Related Work

Numerous studies have been conducted to improve the QoE in video streaming. Approaches have been proposed for adjusting the bitrate in real time in response to network conditions acquired through reinforcement learning [5,9,10]. Tianchi et al. [9] employed reinforcement learning to optimize the heuristic-based adjustment coefficient function F(•) to enhance the user QoE. In contrast, we focus on the correlation between buffer occupancy and network bandwidth rather than adjustment coefficients. Kevin et al. [5] introduced a widely recognized play buffer-based ABR algorithm that improves user QoE by applying the Lyapunov optimization technique to the relationship between the network environment and playback buffer status. Although that study optimized the correlation between the playback buffer and network environment, we focus on predicting the network bandwidth, which is a different approach. Tianchi et al. [10] proposed a hybrid method that combines a buffer-based ABR algorithm with deep learning technology to improve QoE by selecting videos with the optimal quality. Proximal policy optimization for ABR (PPO-ABR) [13] employs a deep reinforcement learning technique to implement real-time bitrate optimization strategy for maximizing QoE. PPO-ABR selects the best policy based on the current network conditions. These studies utilized methods for predicting network conditions, buffer conditions, and user QoE, and for selecting policies through reinforcement learning. By contrast, our proposed scheme uses a GRU rather than reinforcement learning to predict the network bandwidth.

In [20], a bidirectional long short-term memory convolutional neural network (BiLSTM-CNN), LSTM, BiLSTM, support vector regression, and multilayer perceptron (MLP) models were used to predict the future QoE of users. The LL-GABR model proposed in [21] predicts user QoE by inputting video quality, latency, rebuffering, and energy consumption into a reinforcement learning framework. These studies employ a variety of deep learning and machine learning techniques to predict the user QoE, whereas we use a GRU to forecast the network bandwidth. In [22], network bandwidth is predicted using a bidirectional GRU (BiGRU). A novel adaptive bitrate selection algorithm (nABR) that selects the quality of the next video chunk based on the predicted bandwidth is proposed, and its performance is evaluated through numerical analysis. The GRU model and video chunk selection algorithm proposed in this paper have a lower computational overhead, and the performance was evaluated using network bandwidth measurement data based on user mobility.

Waqas et al. [23] selected the optimal video quality by considering buffer occupancy and estimated network conditions, based on real-time monitoring and network status analysis methods. They relied on conventional network performance indicators such as TCP segment transmission time, packet loss rate, and round-trip time, which differ from the time series network bandwidth prediction method employed in this study. In [24,25], MEC capable of delivering cache services is proposed to improve QoE and provide low latency for video streaming applications. In [26], a proof-of-concept for a mobile video-streaming service utilizing MEC is developed. Although these studies improve the QoE of video streaming with MEC, we improve the QoE of streaming services in non-MEC scenarios.

4. Methodology

This section describes the datasets and models used to predict bandwidth in a wireless network environment where video is streamed. The buffer-based ABR scheme is discussed.

4.1. Network Bandwidth-Learning Dataset

In a wireless network environment, the LTE dataset provided by Zeondo was used to obtain network bandwidth data that fluctuate with time and mobility [27,28]. The dataset reflects user mobility, with data collection locations varying over time. To measure performance under different network bandwidth variances, user mobility is measured for pedestrian, bus, and train. From November 2017 to February 2018, University College Cork examined 4G (LTE-A) network parameters such as download and uplink rates based on pedestrian, bus, and train mobility patterns. Two operators walked multiple routes across Cork City Center, Ireland, to measure throughput in the pedestrian scenario. The two operators’ average movement speeds were 2.4 kph and 1.5 kph, respectively, with a minimum speed of 0 and maximum speeds of 4.0 and 3.0. The bus scenario was gathered in the urban and suburban locations, using public transportation. The train scenario was collected using trains traveling 240 km between Cork City and Dublin City and 75 km between Cork City and Farranfore City in Ireland. In the bus and train scenarios, throughput was measured by two operators. The average bus speeds were 17.2 kph and 10.7 kph, respectively. The average train speeds were 60.6 and 53.9 kph. The minimum speeds for both the bus and the train were zero. The maximum bus speeds were 34 and 30 kph, whereas the maximum train speeds were 109.4 and 114 kph. To eliminate bias in the bandwidth prediction model, we employed seven scenarios for each of the three mobility patterns. The entire dataset consisted of approximately 8 h of network bandwidth measurement data. The data were randomly split into learning and testing sets at a 7:3 ratio, with the learning set representing approximately 5 h and the testing set representing approximately 2 h. Figure 4 presents the network bandwidth statistics for the pedestrian, bus, and train datasets.

Figure 4a presents the measurement results for pedestrians. The average network bandwidth measured in the pedestrian dataset was 1373 KB/s with a maximum of 9910 KB/s and minimum of 0 KB/s. In the bus dataset, the average bandwidth was 1250 KB/s with a maximum of 10,984 KB/s and minimum of 0 KB/s. For the train dataset, the average bandwidth was 623 KB/s with a maximum of 21,627 KB/s and minimum of 0 KB/s. As mobility increased, the average network bandwidth decreased. Even when walking, there were instances of intermittent network loss; however, as shown in Figure 4b,c, the higher speeds of buses and trains led to more frequent interruptions in network access. Figure 5 presents the proportion of times network bandwidth was measured at 0 KB/s in each of the pedestrian, bus, and train datasets relative to the total time. Network stability decreased and bandwidth fluctuations became more pronounced as movement speed increased.

4.2. Network Bandwidth Prediction

As a preprocessing step, data were scaled to ensure stable training of the network bandwidth prediction model. Given the large variations in network bandwidth across different scenarios, min–max scaling was applied to normalize the data range to values between zero and one. To determine the proposed model’s hyperparameters, we measured the normalized root meant squared error (NRMSE) values for different layers, epochs, learning rates, and batch sizes. Figure 6 presents the performance measurement results for different hyperparameter settings. As shown in Figure 6a, the NRMSE value increased with the number of layers, whereas as shown in Figure 6b, it decreased with the number of epochs. As shown in Figure 6c,d, the highest prediction accuracy was achieved when the learning rate and batch size were 0.001 and 32, respectively. Based on these results, the proposed GRU model’s structure was defined as shown in Figure 7, and the hyperparameters were set as shown in Table 1. The input size corresponds to the size of the input data, which were the network bandwidth data in this study. The hidden state refers to the dimensionality of the hidden state vector, which the GRU layer passes from each time step of the sequence data to the next. In this study, the hidden state was set to two. The number of layers indicates the depth of the model, with one layer used in this study. The epoch count and learning rates were set to 400 and 0.001. The sequence length defines the length of the input sequence used for predicting the next value using the previous 20 data points, and the batch size was set to 32. The output size refers to the number of output values, with the only output in this case being the predicted network bandwidth. We calculated the training phase time complexity of the GRU-based network bandwidth prediction model. According to [29,30], the time complexity of one gradient step of the GRU is O(

T {d_{h}}^{2} + T d_{h} d_{i}

), where T is the length of the input sequence,

d_{h}

is the dimension of the hidden state, and

d_{i}

is the dimension of the input.

4.3. Buffer-Based ABR Scheme

This section describes the proposed buffer-based ABR technique for network bandwidth prediction. Figure 8 presents the overall structure of the proposed scheme. When video streaming begins, 20 s chunks of the initially set video quality are downloaded, operating similarly to the conventional BBA technique. After a certain number of chunks are downloaded, the algorithm uses the GRU model to predict the network bandwidth at the time of the next chunk will be downloaded.

The proposed technique determines the video quality of the next chunk based on the predicted bandwidth value and current playback buffer occupancy. The detailed operating principles are described in Algorithm 1. The algorithm’s input values include the video quality of the previous chunk, current buffer occupancy, and predicted network bandwidth. After the algorithm executes, the video quality of the next chunk is output. In the algorithm, current_Buffer refers to the current playback buffer occupancy, reservoir refers to the minimum playback buffer occupancy, bandwidth_prediction refers to the predicted network bandwidth value, and prev_rate refers to the video quality of the previous chunk. The lowest video quality is selected if the current playback buffer occupancy is lower than the reservoir. Conversely, if the current buffer occupancy is higher than the minimum buffer occupancy, and the predicted network bandwidth is one or two levels higher than the previous chunk’s video quality, the current video quality or one level higher is selected. Although the current buffer occupancy may be low, high-quality video can be stably received if the available network capacity is high. Because network environment changes may be temporary, the minimum buffer occupancy required for each chunk of video quality is specified, as shown in Figure 9, and is called current_cushion. The x-axis represents buffer occupancy, and the y-axis represents video quality. The adjustment coefficient function F(•) is used in cases other than those mentioned above. The adjustment coefficient is calculated as shown in Equation (2) using the current buffer occupancy, the predicted network bandwidth value, and the cushion value of the video quality currently being downloaded as inputs. The video quality is increased if the product of the current predicted network bandwidth value, buffer occupancy, and cushion range value of the video quality currently being downloaded exceeds the cushion value of the next level of buffer occupancy. Conversely, the video quality is reduced if the result is below the cushion value of the previous level of buffer occupancy. In all other cases, the current video quality is maintained. The next video rate selection algorithm is repeated as many times as the number of video chunks. When there are n video chunks, the algorithm’s time complexity is O(n).

F = bandwidth_prediction \times current_buffer \times cushion_range

(2)

Algorithm 1 Next video rate selection

Require: prev_rate: previous video rate
Require: current_buffer: current buffer occupancy
Require: bandwidth_prediction: predicted network bandwidth for next chunk
Ensure: next_rate: next video chunk rate

if $c u r r e n t_b u f f e r \leq r e s e r v o i r$ then
$n e x t_r a t e = r a t e_m i n$
else if $b a n d w i d t h_p r e d i c t i o n > (p r e v_r a t e + 2) * c u r r e n t_c u s h i o n$ then
$n e x t_r a t e = p r e v_r a t e + 1$
else if $b a n d w i d t h_p r e d i c t i o n > (p r e v_r a t e + 1) * c u r r e n t_c u s h i o n$ then
$n e x t_r a t e = p r e v_r a t e$
else if $F (c u r r e n t_b u f f e r, b a n d w i d t h_p r e d i c t i o n, c u r r e n t_c u s h i o n) \geq u p p e r_c u s h i o n$ then
$n e x t_r a t e = p r e v_r a t e + 1$
else if $F (c u r r e n t_b u f f e r, b a n d w i d t h_p r e d i c t i o n, c u r r e n t_c u s h i o n) < l o w e r_c u s h i o n$ then
$n e x t_r a t e = p r e v_r a t e - 1$
else
$n e x t_r a t e = p r e v_r a t e$
end if
return $n e x t_r a t e$

5. Performance Evaluation

In this section, we evaluate the accuracy of the proposed network bandwidth prediction model and QoE of the buffer-based ABR technique using the predicted values. We used the PyTorch framework ([31]) for the implementation of the GRU-based network bandwidth prediction model. Model training and simulation were conducted on a desktop equipped with an NVIDIA RTX 3060, AMD Ryzen 7 5800X 8-core 3.80 GHz processor, and 32 GB of RAM.

5.1. Performance Evaluation of the Network-Bandwidth Prediction Model

The performance of the network-bandwidth prediction model was evaluated using the dataset provided by Zeondo, as described in Section 4.1. The evaluation was conducted by categorizing user mobility into pedestrian, bus, and train scenarios. The pedestrian category included eight scenarios with a total of 8500 s of measurement data used for evaluation. Similarly, the bus and train categories each included eight scenarios with 6000 s of network bandwidth measurement data used for both categories.

Figure 10 presents the root mean square error (RMSE) of the predicted network bandwidth values. The RMSE values for walking, traveling by bus, and traveling by train were approximately 0.079, 0.08 and 0.06, respectively. A lower RMSE indicates higher prediction accuracy, demonstrating that the proposed network bandwidth prediction model performed well. The average RMSE across the 24 scenarios was 0.073, indicating that the model is highly accurate, even when the environment changes as a result of user mobility in a wireless network.

Figure 11, Figure 12 and Figure 13 present the network bandwidth predictions for the scenarios with the best and worst RMSE values in the pedestrian, bus, and train scenarios, respectively. The scenario with the best RMSE is shown on the left, and that with the worst RMSE is shown on the right. In these figures, blue represents the actual measurement results, and yellow represents the predicted results. Figure 11a reveals that the proposed model predicted very similar bandwidths, even when the bandwidth changed abruptly at 100 s, and it handled small bandwidth changes effectively thereafter. As a result, the RMSE value is low. Figure 11b reveals that the proposed model predicted similar bandwidths, even when the bandwidth varied continually and drastically. However, the RMSE value was relatively high compared with that of the pedestrian scenarios because the bandwidth values for rapid changes approximately 50 and 650 s were inaccurate. Figure 12a and Figure 13a present cases in which bandwidth prediction was accurate for the bus and train scenarios. Because the network bandwidth changes were small, the prediction accuracy of the proposed model was high. Figure 12b and Figure 13b present scenarios with relatively low prediction accuracies. Despite significant variations in network bandwidth, the trend of bandwidth changes was predictable. However, the RMSE increased because the peak bandwidth prediction was somewhat lower than the measured peak bandwidth.

Overall, the proposed GRU-based model predicted the network bandwidth with relatively high accuracy, even when the network-bandwidth changes as a result of user mobility. Therefore, the time-series prediction model is appropriate for predicting network bandwidth.

5.2. Performance Evaluation of the Buffer-Based ABR Technique Using Predicted Network Bandwidth Values

The QoE was assessed using the mean opinion score (MOS), which is a standard metric for measuring voice or video quality, to evaluate the performance of the proposed buffer-based ABR technique [19,32]. The QoE is affected primarily by video quality, video quality change, and rebuffering. The best QoE is attained when the highest video quality is streamed without rebuffering or quality degradation. The worst QoE is obtained when the lowest video quality is streamed and frequent rebuffering occurs. On average, even for videos of medium quality, frequent video quality changes reduce the QoE by distracting users from their streaming. Therefore, as video quality improves, the QoE improves; however, as video quality changes and rebuffering increases, the QoE decreases. The QoE was calculated using Equation (3), which incorporates video quality, the amount of rebuffering, and the number of video quality changes, as these factors significantly impact user QoE. The maximum possible QoE value is 100. In Equation (3), VQ represents the video quality during streaming and is computed as the ratio of the average size of all chunks at the highest video quality over the average size of the selected chunks. Here,

C_{s}

is the chunk size of the selected video quality, and

C_{\max}

is the chunk size at the highest video quality. VQ increases with increasing video quality as the average size of the selected chunks increases, and it decreases with decreasing video quality as the average size of the chunks decreases. QC denotes the video quality change value, which is calculated by dividing

Q_{c}

(the number of video quality changes) by

N_{c}

(the total number of chunks) and multiplying the results by

α

. RC represents the rebuffering change value, which is obtained by dividing

R_{c}

(the number of rebuffering occurrences) by

N_{c}

and multiplying the result by

β

. Because changes in video quality and rebuffering negatively affect QoE, the QoE score is calculated by subtracting these two values from VQ. The parameters

α

and

β

were set to 100 and 500, respectively, reflecting the greater negative impact of rebuffering on QoE compared with video quality changes.

\begin{matrix} QoE & = VQ - (QC + RC) \\ VQ & = (\frac{C_{s}}{C_{\max}}) \times 100 \\ QC & = (\frac{Q_{c}}{N_{c}}) \times α \\ RC & = (\frac{R_{c}}{N_{c}}) \times β \end{matrix}

(3)

The performances of the proposed technique and existing BBA were compared. Evaluations were conducted using simulations and videos provided by PBS, an American public broadcasting company. The video consisted of 215 chunks, each encoded with seven different video qualities. The network environment was modeled based on the LTE dataset described in Section 4.1, which was used to predict network bandwidth in Section 5.1.

First, we measured QoE while walking, and Figure 14 presents the MOS results for each pedestrian scenario. In this figure, blue represents the MOS of BBA, and orange represents the MOS of the proposed technique. The proposed technique achieved an average MOS of 60, whereas BBA achieved an average MOS of 56, indicating that the proposed technique offers approximately 7% better QoE than BBA. The standard deviation of the proposed method is approximately 13, and that of BBA is approximately 17, which means that the QoE of the proposed method is more stable. In the pedestrian scenarios, the change in network bandwidth is not significant, but instantaneous fluctuations in network bandwidth while walking occur because multiple users access the same network simultaneously. Figure 10 shows that pedestrian situations 1, 2, and 7 have a low RMSE, indicating that the bandwidth prediction is accurate. As demonstrated in Figure 14, the MOS of pedestrian scenarios 1, 2, and 7 outperforms BBA owing to accurate bandwidth prediction and the selection of appropriate video rate chunks. Consequently, the proposed scheme provides a higher QoE than BBA. Table 2 presents detailed performance indicators for scenario 2, in which the performance differences between the two methods were the most pronounced. The average video qualities were similar, with the proposed method at 2624 KB and BBA at 2615 KB. However, the proposed method yielded 22 video quality changes, approximately 26% fewer than BBA’s 30 changes, and had only 15 rebuffering events, which is less than half of BBA’s total. Figure 15 and Figure 16 display the selected video quality and buffer occupancy, respectively, in response to network bandwidth changes in pedestrian scenario 2. The blue curve represents BBA, the orange curve represents the proposed technique, and the red curve represents the network bandwidth. Initially, as the network bandwidth was stable, video quality and buffer occupancy increased simultaneously. Although the network bandwidth began to decrease after approximately 100 s, the video quality remained stable because the videos were buffered. BBA began to reduce video quality at approximately 500 s, causing buffer occupancy to drop to zero temporarily. By contrast, the proposed technique managed to prevent rebuffering by lowering video quality starting at approximately 450 s.

Figure 17 presents the MOS measurement results for each scenario when traveling by bus. The proposed technique achieved an average MOS of 76, which is approximately 5% better than the BBA’s average MOS of 72. The standard deviation of the proposed method is approximately 9, and that of the BBA is approximately 10, which means that the stabilities of both schemes are similar. Although the MOS values for both techniques were similar in most of the scenarios, the proposed technique delivered a superior QoE in scenario 7. As detailed in Table 3, the average video quality selected by both techniques was comparable, but the proposed technique resulted in fewer video quality changes. The number of rebuffering events was two for the proposed technique, compared with 12 for BBA. This reduction in rebuffering contributed to the higher MOS of the proposed technique. Changes in the network environment while traveling by bus were more significant than those while walking, with the network bandwidth often converging to zero over a longer duration. Figure 18 and Figure 19 present the selected video quality and buffer occupancy in response to network bandwidth changes in bus scenario 7. The network bandwidth approached 0 between 350 and 400 s. BBA gradually reduced the video quality to keep the buffer occupancy low for a period. In contrast, the proposed technique reduced video quality more quickly, leading to a faster increase in buffer occupancy. By swiftly responding to decreases in network bandwidth and selecting the lowest video quality as soon as a drop is detected, the proposed technique effectively minimizes rebuffering.

Finally, the QoE was measured in the train scenario. Figure 20 presents the MOS measurement results for each train travel scenario. The proposed technique achieved an average MOS of 65, which is significantly greater than the BBA algorithm’s average MOS of 45. The standard deviation of the proposed method is approximately 14, and that of BBA is about 27, which means that the QoE of the proposed method is more stable. Table 4 details the QoE measurement indicators for train scenario 1. The average video quality for the proposed method was 3762 KB, whereas the average video quality for BBA was 3668 KB, indicating that the proposed method selected higher image quality than the BBA. Both techniques had the same number of video quality changes, but 119 rebufferings occurred with BBA, whereas only 28 rebuffers occurred with the proposed technique. Figure 21 and Figure 22 present the selected video quality and buffer occupancy in response to network bandwidth changes in the train scenario 1. After 200 s, as the network bandwidth decreased, the buffer occupancy also decreased. Consequently, both techniques began reducing image quality at approximately 500 s. As the network environment improved, the proposed technique increased video quality more rapidly than the BBA. Although the buffer occupancy of the proposed technique was lower than that of BBA, the image quality remained comparable. As the network bandwidth gradually decreased starting at 800 s and converged to zero at approximately 1000 s, rebuffering occurred in both techniques. However, while the BBA struggled to respond to instantaneous bandwidth changes, leading to prolonged rebuffering, the proposed technique quickly reduced video quality, preventing a significant drop in buffer occupancy and minimizing rebuffering.

In our performance evaluations across 24 pedestrian, bus, and train scenarios, the proposed technique achieved a higher MOS than BBA in 11 scenarios, whereas BBA outperformed the proposed technique in six scenarios. Overall, the MOS of the proposed technique was approximately 40% greater than that of BBA. In the six scenarios where BBA had a higher MOS, the difference was approximately 7%. Therefore, the proposed technique not only delivers superior QoE compared with BBA in environments with significant network fluctuations, such as on trains, but also offers comparable QoE with BBA in more stable network environments, such as those experienced by pedestrians and bus passengers.

In the performance evaluation results for the pedestrian, bus, and train scenarios, the proposed scheme yielded a higher overall QoE than did BBA. In the train scenario, where users moved rapidly, the average video quality improved, and fewer rebuffering events were measured. These results indicate that video rate selection based on the proposed accurate bandwidth prediction method is effective, even when the network environment varies dramatically, resulting in improved overall performance. However, the proposed scheme has limitations. When the network bandwidth fluctuates little, as in the pedestrian and bus scenarios, the average video quality improvement is insignificant. To improve the performance of the proposed scheme in practice, the network bandwidth prediction should be rapid even without a GPU.

6. Conclusions

We proposed a GRU-based network bandwidth prediction model and a buffer-based ABR algorithm to enhance the QoE of video streaming applications. The proposed GRU model predicted network bandwidth with reasonable accuracy, with an average RMSE of 0.079 across scenarios involving pedestrian, bus, and train movement. Our buffer based ABR scheme yielded approximately 11% higher QoE compared with that of the existing BBA in various mobility scenarios. This implies that our scheme produces high-quality video for users, resulting in reduced rebuffering and video quality changes. Our key contributions include accurate bandwidth prediction using the GRU model, optimal video quality selection based on predicted bandwidth and buffer occupancy, and enhanced performance in networks with substantial fluctuations. Future work will focus on improving the QoE of real-time video-streaming applications in wireless networks.

Author Contributions

Conceptualization, J.W., S.H. and D.A.; methodology, J.W. and D.A.; software, J.W.; validation, J.W., D.K. and D.A.; writing—original draft preparation, J.W. and D.A.; writing—review and editing, J.W., S.H., D.K. and D.A.; visualization, J.W. and D.A.; supervision, D.K. and D.A.; project administration, S.H. and D.A.; funding acquisition, S.H. and D.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by ICT R&D program of MSIT/IITP [2021-0-00715, Development of End-to-End Ultra-high Precision Network Technologies]. This research was supported by “Regional Innovation Strategy (RIS)” through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2021RIS-005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GRU	Gated Recurrent Unit
ABR	Adaptive Bitrate
RNN	Recurrent Neural Network
RTT	Round-Trip Time
RMSE	Root Mean Squared Error
MOS	Mean Opinion Score

References

Sandvine. Global Internet Phenomena Report. 2023. Available online: https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/2023/reports/Sandvine%20GIPR%202023.pdf (accessed on 26 August 2024).
Wang, M.; Xu, C.; Jia, S.; Muntean, G.M. Video streaming distribution over mobile Internet: A survey. Front. Comput. Sci. 2018, 12, 1039–1059. [Google Scholar] [CrossRef]
Bentaleb, A.; Taani, B.; Begen, A.C.; Timmerer, C.; Zimmermann, R. A survey on bitrate adaptation schemes for streaming media over HTTP. IEEE Commun. Surv. Tutorials 2018, 21, 562–585. [Google Scholar] [CrossRef]
Huang, T.Y.; Johari, R.; McKeown, N.; Trunnell, M.; Watson, M. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 187–198. [Google Scholar]
Spiteri, K.; Urgaonkar, R.; Sitaraman, R.K. BOLA: Near-optimal bitrate adaptation for online videos. IEEE/ACM Trans. Netw. 2020, 28, 1698–1711. [Google Scholar] [CrossRef]
Spiteri, K.; Sitaraman, R.; Sparacio, D. From theory to practice: Improving bitrate adaptation in the DASH reference player. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2019, 15, 1–29. [Google Scholar] [CrossRef]
Jiang, J.; Sekar, V.; Zhang, H. Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive. In Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, Nice, France, 10–13 December 2012; pp. 97–108. [Google Scholar]
Lekharu, A.; Moulii, K.; Sur, A.; Sarkar, A. Deep learning based prediction model for adaptive video streaming. In Proceedings of the 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 7–11 January 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 152–159. [Google Scholar]
Huang, T.; Zhang, R.X.; Yao, X.; Wu, C.; Sun, L. Being more effective and interpretable: Bridging the gap between heuristics and AI for ABR algorithms. In Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos, Beijing, China, 19–24 August 2019; pp. 12–14. [Google Scholar]
Huang, T.; Zhou, C.; Zhang, R.X.; Wu, C.; Yao, X.; Sun, L. Stick: A harmonious fusion of buffer-based and learning-based approach for adaptive streaming. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Virtually, 6–9 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1967–1976. [Google Scholar]
Souane, N.; Bourenane, M.; Douga, Y. Deep reinforcement learning-based approach for video streaming: Dynamic adaptive video streaming over HTTP. Appl. Sci. 2023, 13, 11697. [Google Scholar] [CrossRef]
Naresh, M.; Gireesh, N.; Saxena, P.; Gupta, M. Sac-abr: Soft actor-critic based deep reinforcement learning for adaptive bitrate streaming. In Proceedings of the 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 4–8 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 353–361. [Google Scholar]
Naresh, M.; Saxena, P.; Gupta, M. Ppo-abr: Proximal policy optimization based deep reinforcement learning for adaptive bitrate streaming. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 199–204. [Google Scholar]
Ravindran, A.A. Internet-of-Things Edge Computing Systems for Streaming Video Analytics: Trails Behind and the Paths Ahead. IoT 2023, 4, 486–513. [Google Scholar] [CrossRef]
Taleb, T.; Samdanis, K.; Mada, B.; Flinck, H.; Dutta, S.; Sabella, D. On multi-access edge computing: A survey of the emerging 5G network edge cloud architecture and orchestration. IEEE Commun. Surv. Tutorials 2017, 19, 1657–1681. [Google Scholar] [CrossRef]
Kanai, K.; Imagane, K.; Katto, J. Overview of multimedia mobile edge computing. ITE Trans. Media Technol. Appl. 2018, 6, 46–52. [Google Scholar]
Zhang, Q.; Sun, H.; Wu, X.; Zhong, H. Edge video analytics for public safety: A review. Proc. IEEE 2019, 107, 1675–1696. [Google Scholar] [CrossRef]
Cho, K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Chung, J. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Dinaki, H.E.; Shirmohammadi, S.; Janulewicz, E.; Côté, D. Forecasting video QoE with deep learning from multivariate time-series. IEEE Open J. Signal Process. 2021, 2, 512–521. [Google Scholar] [CrossRef]
Raman, A.; Turkkan, B.; Kosar, T. LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning. arXiv 2024, arXiv:2402.09392. [Google Scholar]
Huu, T.V.; Huong, T.N.T.; Le, H.C. QoE Aware Video Streaming Scheme Utilizing GRU-based Bandwidth Prediction And Adaptive Bitrate Selection For Heterogeneous Mobile Networks. IEEE Access 2024, 12, 45785–45795. [Google Scholar] [CrossRef]
ur Rahman, W.; Chung, K. Buffer-based adaptive bitrate algorithm for streaming over HTTP. KSII Trans. Internet Inf. Syst. (TIIS) 2015, 9, 4585–4603. [Google Scholar]
Jiang, X.; Yu, F.R.; Song, T.; Leung, V.C. A survey on multi-access edge computing applied to video streaming: Some research issues and challenges. IEEE Commun. Surv. Tutorials 2021, 23, 871–903. [Google Scholar] [CrossRef]
Bilal, K.; Erbad, A. Edge computing for interactive media and video streaming. In Proceedings of the 2017 Second International Conference on Fog and Mobile Edge Computing (FMEC), Valencia, Spain, 8–11 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 68–73. [Google Scholar]
Yang, S.R.; Tseng, Y.J.; Huang, C.C.; Lin, W.C. Multi-access edge computing enhanced video streaming: Proof-of-concept implementation and prediction/QoE models. IEEE Trans. Veh. Technol. 2018, 68, 1888–1902. [Google Scholar] [CrossRef]
LTE Dataset. 2023. Available online: https://www.kaggle.com/datasets/aeryss/lte-dataset (accessed on 26 August 2024).
Raca, D.; Quinlan, J.J.; Zahran, A.H.; Sreenan, C.J. Beyond throughput: A 4G LTE dataset with channel and context metrics. In Proceedings of the 9th ACM Multimedia Systems Conference, Amsterdam, The Netherlands, 12–15 June 2018; pp. 460–465. [Google Scholar]
Rotman, M.; Wolf, L. Shuffling recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 9428–9435. [Google Scholar]
Lee, M.C. Research on the feasibility of applying GRU and attention mechanism combined with technical indicators in stock trading strategies. Appl. Sci. 2022, 12, 1007. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. 2017. Available online: https://openreview.net/forum?id=BJJsrmfCZ (accessed on 26 August 2024).
Streijl, R.C.; Winkler, S.; Hands, D.S. Mean opinion score (MOS) revisited: Methods and applications, limitations and alternatives. Multimed. Syst. 2016, 22, 213–227. [Google Scholar] [CrossRef]

Figure 1. Total volume of app categories in 2022.

Figure 2. Buffer occupancy calculation.

Figure 3. Video quality selection in the buffer-based adaptive bitrate algorithm.

Figure 4. Bandwidth measurement results.

Figure 5. Ratio of measurements recorded as zero among total measurements.

Figure 6. Normalized root meant squared error (NRMSE) values with different hyperparameter settings.

Figure 7. GRU model structure.

Figure 8. Structure of the proposed scheme.

Figure 9. Relationship between video rate and buffer occupancy.

Figure 10. Root mean square error (RMSE) of network bandwidth prediction in different mobility scenarios.

Figure 11. Comparison of the measured and predicted network bandwidths in the pedestrian scenario.

Figure 12. Comparison of the measured and predicted network bandwidths in the bus scenario.

Figure 13. Comparison of the measured and predicted network bandwidths in train scenario.

Figure 14. Comparison of mean opinion scores in pedestrian scenarios.

Figure 15. Comparison of video quality in pedestrian scenario 2.

Figure 16. Comparison of buffer occupancies in pedestrian scenario 2.

Figure 17. Comparison of mean opinion scores in bus scenarios.

Figure 18. Comparison of video quality in bus scenario 7.

Figure 19. Comparison of buffer occupancy in bus scenario 7.

Figure 20. Comparison of the mean opinion scores in train scenarios.

Figure 21. Comparison of video quality in train scenario 1.

Figure 22. Comparison of buffer occupancy in train scenario 1.

Table 1. Parameters for training the gated recurrent unit model.

Input size	1
Hidden state	2
Number of layers	1
Epochs	400
Learning rate	0.001
Sequence length	20
Batch size	32
Output size	1

Table 2. Comparison of the proposed scheme and buffer-based adaptive bitrate algorithm (BBA) in pedestrian scenarios.

	Proposed Technique	BBA
Average video quality	2642 KB	2615 KB
Number of video quality changes	22	30
Number of rebufferings	15	32

Table 3. Comparison of the proposed scheme and the buffer-based adaptive bitrate algorithm (BBA) in bus scenarios.

	Proposed Technique	BBA
Average video quality	4060 KB	4066 KB
Number of video quality changes	16	20
Number of rebufferings	2	12

Table 4. Comparison of the proposed scheme and buffer-based adaptive bitrate (BBA) algorithm in train scenarios.

	Proposed Technique	BBA
Average video quality	3762 KB	3668 KB
Number of video quality changes	25	25
Number of rebufferings	28	119

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Woo, J.; Hong, S.; Kang, D.; An, D. Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction. Appl. Sci. 2024, 14, 10490. https://doi.org/10.3390/app142210490

AMA Style

Woo J, Hong S, Kang D, An D. Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction. Applied Sciences. 2024; 14(22):10490. https://doi.org/10.3390/app142210490

Chicago/Turabian Style

Woo, Jeonghun, Seungwoo Hong, Donghyun Kang, and Donghyeok An. 2024. "Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction" Applied Sciences 14, no. 22: 10490. https://doi.org/10.3390/app142210490

APA Style

Woo, J., Hong, S., Kang, D., & An, D. (2024). Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction. Applied Sciences, 14(22), 10490. https://doi.org/10.3390/app142210490

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving the Quality of Experience of Video Streaming Through a Buffer-Based Adaptive Bitrate Algorithm and Gated Recurrent Unit-Based Network Bandwidth Prediction

Abstract

1. Introduction

2. Background

2.1. ABR Algorithm

2.2. BBA

3. Related Work

4. Methodology

4.1. Network Bandwidth-Learning Dataset

4.2. Network Bandwidth Prediction

4.3. Buffer-Based ABR Scheme

5. Performance Evaluation

5.1. Performance Evaluation of the Network-Bandwidth Prediction Model

5.2. Performance Evaluation of the Buffer-Based ABR Technique Using Predicted Network Bandwidth Values

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI