1. Introduction
Multimedia streaming has been on the hype during the last decade, enabling a smarter video-watching system. It is clear to all how well-known brands, e.g., YouTube, Netflix, Prime Video, Hulu, Disney+, etc., have completely changed the way people watch video content. According to the statistics, the majority of Internet traffic originates from video streaming applications such as Netflix, YouTube, etc. [
1]. This spread is heavily supported by a set of new technologies introduced in the last decades that enabled a smart implementation of streaming solutions for everyone. Among others, Dynamic Adaptive Streaming over HTTP (DASH) made it possible to stream large video content while adapting to the variable network conditions and exploiting HTTP protocol [
2], hence, overcoming older protocols (e.g., Real-time Transport Protocol—RTP) suffering from reduced client compatibility and gateway port filtering.
In order to implement a proper video streaming application, new network protocols required a huge amount of storage space and computing efforts. On the one hand, videos require lots of storage space. Despite the recently introduced more efficient encoding standards, video content still remains one of the heaviest in terms of data size [
3]. At the same time, an increased number of devices, video codecs, and user requirements have pushed towards scalable approaches where the same video content should be adapted to the users’ requests [
4]. In order to do this, additional processing capabilities are required. Therefore, cloud computing facilities soon became one of the main enabling technologies for video streaming services [
5].
Despite big efforts in integrating cloud computing in streaming scenarios, the last years have been characterized by an impressive increase in multimedia content, highlighting some disadvantages inherently present in the centralized cloud computing approach, mainly in terms of high latency and scalability issues. In future 5G mobile networks, the low end-to-end communication latency and high available bandwidth allows the mobile users to stream with high quality. Fog [
6,
7] and edge [
8,
9] computing paradigms, by bringing the services closer to the users, have indeed increased the interest of the multimedia world as a feasible way of solving latency and scalability, which leads towards satisfying the requirements of 5G networks. On one side, edge and fog computing nodes can be installed nearer to the users, reducing the latency, and, on the other side, their numbers can be increased, reducing the problem of scalability [
10,
11,
12].
By exploiting the fog computing paradigm, we are aiming at integrating the DASH approach into a fog computing scenario. In particular, by exploiting the DASH media segment organization, we propose a joint storage and processing approach that, leveraging on a fog computing infrastructure, allows to optimally place the video segments in order to maximize the user experience. In particular, the proposed approach allows taking into account the possible limitations of the fog nodes in terms of the amount of data that can be stored so as to build an implementable solution for delivering videos at the edge. In addition, based on the fact that videos are usually organized in multiple quality levels, targeted to different user requests, we aim at integrating an on-demand approach that allows to jointly consider the video quality scalability and transcoding. While scalable video coding (SVC) allows to interleave different streams, typically one basic stream and one additional stream, so as to increase the quality from the basic to the targeted higher quality, in the case of transcoding, the quality can be changed to fit the user requests [
13]. To this aim, a proper procedure is proposed for integrating both scalability and transcoding in the fog computing architecture, aiming at minimizing the overall video delivery latency. This can be made possible in DASH, where the video content management is based on an information collecting procedure, and the video content is organized through a proper media presentation description (MPD) format [
2]. MPD includes segment information such as video resolution and bit rates, used as metrics for the quality of experience (QoE) evaluation. Based on the target quality level, a decision is made with possible transcoding or scalable video coding.
Despite some recent research in this field, the possibility of exploiting fog and edge nodes for assisting DASH-based video delivery is still unexplored in depth. Our goal is to focus on two algorithms able to optimize the segment placement in a distributed fog network. To this aim, both communication and storage constraints have been considered, aiming at optimally placing the video segments while respecting both target quality and latency requirements. While the first proposed algorithm considers the target quality as a primary constraint, in the second algorithm, the quality could be degraded in case the latency requirement cannot be respected. In particular, both SVC (based on a cloud repository) and transcoding (based on an edge server) techniques are considered when evaluating the latency and the quality requirements.
The paper is structured in this way. In 
Section 2, the literature background in the area of VoD and its implementation through edge and fog computing infrastructure is reviewed. In 
Section 3, the system model is described with a particular focus on the latency model, while in 
Section 4, the problem is formulated by introducing a cost function aiming to map the considered scenario. The proposed algorithms are explained in 
Section 5, while in 
Section 6, the numerical results obtained through computer simulations are shown. Finally, conclusions are drawn in 
Section 7.
  2. Literature Review
Multimedia is considered one of the major application areas for 5G and B5G services [
14]. As an example, video streaming and video-on-demand applications are considered as one of the pillars for smart city services [
15]. To this aim, some papers developed specific solutions for including videos in different smart city applications. As an example, in [
16], the authors proposed a smart city framework for video surveillance, where multiple cameras disseminated within a city are exploited. This proposed framework, named UTOPIA, allows to perform a video analysis exploiting MapReduce. The authors propose to use the developed solution for video surveillance applications. The idea of using videos for implementing an intelligent transportation system is instead proposed in [
17]. The authors propose to use road-side cameras jointly with road traffic monitoring servers. The visual IoT idea is instead introduced in [
18]. Trying to merge multimedia content and IoT, the authors envisaged modelling the cameras as sensors for implementing smart city services. The authors first performed an analysis of the requirements of the proposed visual IoT approach, and then an architecture enabling the visual IoT concept to operate was introduced. Additionally, in [
19], smart surveillance applications based on IoT sensors are considered. In this context, a heterogeneous network scenario is considered to implement a multi-purpose real-time video surveillance application to be applied to both smart cities and border control. The proposed solution considers three main aspects, which are the QoS requirements of the real-time video application, the cost-benefit of the spectrum allocation and the time constraints involved in vertical handover operations.
On the other side, MEC solutions in 5G scenarios have been considered in several papers also enabling multimedia services [
20]. As an example, in [
21], the authors explored the possibility of using an edge computing facility for content distribution in a smart city environment. Through the MEC, the system is capable of offering data storage and processing capabilities closer to data sources and data consumers. The authors explored how MEC can be used when user mobility and resource allocation should be managed. The possibility of using DASH in the cloud architecture has been examined in [
22]. The authors proposed to use three different cloud computing companies operating in nine different data centres for streaming a video. The conclusion was that deploying DASH in the cloud is useful and that the performance of DASH is enhanced when using the cloud; this is even more true when the data centre is near the users. Another paper investigated the use of the cloud for the transcoding function [
23]. The paper proposed a scheduling algorithm that optimizes the video transcoding time (VTT), i.e., the time needed to transcode the video, which in turn influences the video transcoding mode (VTM), i.e., the process that overlooks the transcoding requests. It is demonstrated that the algorithm allows a faster transcoding process, better management of tasks and a continuous stream. The authors in [
24] propose a partial transcoding method for content management where each content is encoded into different bitrates and split into segments. They aimed at minimizing the cost, which includes storage and computing costs. In [
25], the analytic hierarchy process (AHP) method was proposed. In this paper, a multi-tier fog architecture iwas used and several metrics such as bandwidth, latency and financial costs are taken into consideration. Based on all the collected data and the AHP output, the most suitable fog nodes are used for the stream. This has shown to be an advantage from the QoE side as well on the delay side since it is reduced remarkably. Additionally, it is shown that the bandwidth is increased and the cost is minimized. A similar paper has looked into optimizing the number of fog nodes needed to reduce the delay [
26]. This paper uses the distance between the fog nodes and the gateways. It measures the latency by leveraging the k-means clustering method in different ways, calculating the distance between the fog nodes and the gateways with distinct approaches that consider the number of fog nodes and the number of gateways. It proved that there are several ways to execute the placement of fog nodes, and all of them lead to much lower latency. Ref, [
27] instead considered the creation of a fog network by the means of an online optimization that is capable of improving the functions of the network. The pilot fog node shared the workload with the nearby fog nodes to offload tasks. The online optimization algorithm observed the neighbouring fog nodes and picked the appropriate ones based on their capabilities to form the network. An offload optimization algorithm is used to disperse tasks. Following those methods, the maximum delay in the network is effectively cut down and the task offloading is enhanced. Another paper explored the relocation of the transcoding node closer to the users [
28]. The transcoding node would be aware of the user’s environment and would receive the media requested. Since the node knows the media solicitor, it can transcode the segments properly for the user; this leads to a system that is flexible when it comes to contrasting users, more efficient with fewer segment levels and more scalable as it can handle more users.
  3. System Model
Let us consider having a set  composed of N users and a set  composed of M videos available for their requests. The deployed network is composed of a set  of K fog nodes; in addition, one remote cloud facility is considered. Each user is able to request one video with a certain quality level, so that each request can be mapped to a given target quality level, identifying the user request QoE. We consider that there are a limited number of quality levels corresponding to Q where  corresponds to the worst quality and  corresponds to the highest quality.
Let us define 
 as the tuple identifying the requested 
mth video with the 
q-level quality by the 
nth user. By leveraging on the distributed fog nodes facility, each video is supposed to be segmented, where the 
mth video is segmented in 
 segments whose size can be fixed or change depending on the considered encoder and video characteristics [
29]. Moreover, transcoding and scalability functions are foreseen to be used in the deployed scenario. While transcoding allows to receive a given video segment with a target quality lower than that of the video already deployed, the scalability allows to request an enhanced video stream able to cope with the missing information for achieving higher quality with respect to the quality of the stored video.
We can consider an allocation matrix 
 as a function allowing us to map the presence of the 
-th segment belonging to the 
m-th video on the 
k-th fog node. Since each segment can be stored with different quality levels, it is possible to define the allocation matrix as:
      where 
 is the condition stating that the 
-th segment belonging to the 
m-th video is stored on the 
k-th fog node with quality level 
q. Since fog nodes are supposed to be storage-capacity constrained, a maximum amount of data can be stored in each node. Let us assume that the generic 
k-th node can buffer up to 
 bytes.
Based on the video request table 
, composed by 
n records, each one identifying the video and the quality requested by each user, each video segment will be mapped on a given fog node in order to minimize the overall delay for receiving the requested video from the user’s side. To this aim, the cost of any deployment configuration can be modeled as the overall time needed for receiving all the segments of the requested video with the target quality level, i.e.:
      corresponding to the time needed from the user 
n’s side to receive all the video segments of the 
mth video, where: 
      and 
 is the size of the 
-th segment of the 
m-th video with quality level 
q, 
 is the data rate between node 
i and node 
j, 
 is the target quality of user 
n, and 
 is the time requested for transcoding the segment 
 to the target quality segment 
.
Figure 1 depicts the architecture of the proposed fog-assisted DASH scenario. We have considered having one transcoding node, identified with index 
T able to process multiple segments at the same time. This node can be identified with an edge computing facility collocated with the access point (AP) in the area. Moreover, a remote cloud facility is considered, as identified with index 0, where the up-scaling data are stored, identified as 
, needed for scaling up the quality from level 
 to level 
q. The data rate of the links is a function of the distance between node 
i and node 
j such as:
      derived from the log-distance path loss model [
30].
 Depending on the matching between the target quality level  and the quality level q of the -th segment to be received,  can have three values:
- If the quality  q-  is higher than the target quality  - , the download time corresponds to the latency between the source  k- th fog node to the transcoding node plus the transcoding time from quality  q-  to quality  -  plus the latency from the transcoding node to the requesting user (case 1.A and 1.B in  Figure 1- ). 
- If the quality  q-  of the stored video and the target quality matches the latency, the download time corresponds to the time between the  k- -th fog node and the requesting user (case 2 in  Figure 1- ). 
- If the quality  q-  is lower than the target quality  - , the system allows to gain the enhancement from the cloud to target the requested quality level. This results in the latency from the  k- th node to the requesting user of the  q-  level segment plus the latency of the missing part from the cloud to the requesting node (case 3.A and 3.B in  Figure 1- ). 
As an additional case, we consider when the segment is not present in any node. In that case, we suppose that it can be entirely downloaded from the cloud facility.
  4. Problem Formulation
Streaming media consumes a lot of bandwidth; moreover, with the increased request of high-quality videos, their size becomes too heavy. Even if new compression standards have been introduced [
31], that alone is not enough. DASH can help in delivering the most suitable quality to the user with reduced bandwidth wastage [
32]. Fog computing would leverage computation and storage by distributing computing to various nodes, therefore reducing latency by having wider geographical distribution. The combination allows to break the video into segments and distribute them to fog nodes, which in turn makes it reach the user faster and with better quality.
By assuming that the 
qth quality level is mapped on an average video bit rate equal to 
, the video segments delivery is constrained to:
      in order to avoid video latency disruption during the streaming. Equation (
5) models that the delivery time for the first segment should be lower than the average playout time, the delivery time of the first two segments should be lower than the playout time of both, and so on.
At the same time, the video quality, in terms of video bit rate, is a qualitative feature to be considered during the video streaming; thus, a reduced quality is sometimes tolerated, while a video stuck represented by underbuffering is undesired. To this aim, a relaxing assumption could be included, allowing to retrieve a given segment with a lower quality.
Given a certain initial video segment placement, the goal of the problem is to optimize the placement of the segments in order to minimize the delivery latency, defined as:
      such that
      
      where 
 is a function able to map the quality level into a given amount of Bytes. Constraint (
7d) posits that the amount of segments for all the videos that can be stored in the fog computing environments is smaller than all the segments composing all the videos. Indeed, the fog computing environment is a resource-limited scenario and is able to store only a portion of the videos. The remaining are supposed to be stored on the remote cloud. Constraint (7e) posits that the starting condition is that the segments are randomly scattered on the fog nodes with a random quality. It is worth noticing that despite (
6) minimizing the overall delay, its value depends on the delay components, as defined in (
2), which, in turn, depend on the allocation, following (
3).
  5. Distributed Mapping Algorithms
Since the problem outlined in (
6) is NP-hard due to the presence of multiple segments to be deployed on multiple nodes while respecting the time constraints as defined in (
3), we resort to two heuristic algorithms. In particular, we focus on two algorithms, named 
marginal gain maximization and 
quality downgrade. While in the first, we aim at minimizing the difference between the delivery deadline and the latency through a proper mapping, in the second algorithm, we aim at respecting the latency as a hard constraint while allowing a quality downgrade in selecting the proper segment.
The marginal gain maximization algorithm gains from the fact that for reducing the outage defined in (
5), we can first of all assign the segments that have the highest marginal gain between the playback rate and the transmission rate, i.e.,
      
This can be implemented by selecting the best assignment among the available nodes starting from the first segment of any requested video. We can rank the mapping of all the marginal gains and assign them in decreasing order. The best assignment will be the one allowing to have the highest marginal gain, hence minimizing the outage probability.
As shown in Algorithm 1, the algorithm works by processing over all the nodes (line 1). It is initialized by reading the video request table, the related video for each user as well as the initial segments allocation map 
 (lines 2–4). These data are used for evaluating the delays of all the possible placements of the segments in the allocation map on all the fog nodes in the network (lines 5–9). This evaluation is used for changing the placement of the chunks for maximizing the marginal gain of each segment (lines 11–14). The algorithm has a computational complexity of 
.
      
| Algorithm 1 Marginal gain maximization. | 
| Input: 
                  , , , Output: 
                  1:for alldo2:    Read the Video Request Table 3:    Select the video m requested by the user n4:    Read the segments Allocation Map 5:    for all  do6:        for all  do7:           Calculate 8:        end for9:    end for10:end for11:fordo12:    Find 13:    14:end for
 | 
The second algorithm, named 
quality downgrade, is based on (
8) to check the marginal gain for every user having its own segments on any fog node. For any segment not respecting the constraint, the quality related to it will be reduced so as to respect the latency constraint. To this aim, a proper decision is taken based on the evaluation of the latency outage, defined as:
The goal of the algorithm is to offer an uninterrupted stream, even if it means providing segments with a lower quality than the user requested. This is more suitable in time-sensitive applications where the stream interruption would be avoided.
As shown in Algorithm 2, it is possible to notice that each time the marginal gain is calculated, in the case that the latency outage in (
9) is not respected, the quality is downgraded up to the lower level, i.e., 
, and the map allocation table updated accordingly. The computational complexity of the algorithm is 
, where 
X stands for the number of iterations in the while loop.
      
| Algorithm 2 Quality downgrade. | 
| Input: 
                  , , , Output: 
                  , Q1:for to N do2:    Read the Video Request Table 3:    Select the video m requested by the user n4:    Read the segments Allocation Map 5:    for all  do6:        for all  do7:           Calculate 8:           while  and  do9:               10:               Update 11:           end while12:        end for13:    end for14:end for15:fordo16:    Find 17:    18:end for
 | 
In addition, a playout rate-based buffering technique, while taking advantage of the position of the segments and the length of the video, works in parallel with any of the previous algorithms by adapting the buffering on the playout rate so that the segments downloaded first will belong to the first portion of the video. In this way, the video can be loaded faster and the rules can be relaxed for the remaining segments of the video.
This is favourable in slow networks in which the method allows to stream the first part of the videos as fast as it can to permit the users to start watching and then it downloads the rest while the user is occupied with watching, but it can be detrimental if the users want to skip the first part, as that will lead to a waste of bandwidth.
The two previously introduced algorithms have been compared with two other baseline algorithms in order to better understand their advantages and possible issues. In the first one, named 
fixed allocation, the segments of the videos on the fog nodes are considered to be fixed throughout the whole evaluation. Taking into account (
1), the segments are distributed over the allocation matrix to 
K nodes handling 
P segments distributed for 
M videos. Each node is able to handle a maximum number of segments, leaving some segments not available and having to request to the cloud for downloading the required segment. This is the simplest algorithm, and it can be executed without much additional cost, but it does leave room for improvement. It is a general-purpose algorithm that can be most beneficial for fast networks. The second benchmark is instead named 
optimal buffering. In this case, instead, the goal was to reallocate the segments originally placed so as to respect both (
5) and the requested quality. However, in order to avoid having unfeasible solutions while minimizing 
P1, we consider that the quality should always follow the one requested.
  6. Numerical Results
Numerical results were obtained in MATLAB. We created a network that consists of videos of different lengths segmented over fog nodes and accessed by users. The following comparisons were performed to evaluate the performance of the proposed algorithm when different parameters are changed. To this aim, we considered changing the number of users, while keeping the number of fog nodes and videos fixed, changing the number of fog nodes while keeping the number of users and videos fixed, and changing the number of videos while keeping the number of users and fog nodes fixed.
We hypothesized having a working area with a size of 100 × 100 m. The data rate among nodes was supposed to be equal to 150 Mb/s at a distance of 10 m, though it decreased as the distance increased according to (
4). The data rate for downloading from the cloud was set at a fixed rate equal to 15 Mb/s.
There were 5 quality levels corresponding to different video rates, equal to 1 MB/s, 2 MB/s, 5 MB/s, 10 MB/s and 20 MB/s, respectively. The transcoding time was considered a function of the difference between the target and the original quality level. In particular, we set 
, corresponding to say that we need 0.1 s for each quality level to be changed. The duration of each segment was a random value between 1 s and 7 s [
29]. The system was evaluated through the utility function:
      where 
x maps the quality of the requesting user. The users were then clustered based on the quality of the segments they request. The higher the number returned by the function, the better the performance of the system [
28]. Since the grouping is performed on the quality, 
, it corresponds to measuring how well the system is performing and is affected by the change in quality. The utility function was used since it is a simple way to assess the behaviour of the system.
Three scenarios for evaluating the performance were set up, as detailed in 
Table 1. When varying the users, an evaluation was performed by having a network that consisted of 10 fog nodes and 10 videos; the number of users increased from 2 users to 100 users. When varying the nodes, the evaluation was performed by having a network that consisted of 20 users and 10 videos; the number of fog nodes increased from 3 fog nodes to 20 fog nodes. When varying the videos, the evaluation was performed by having a network that consisted of 20 users and 10 fog nodes; the number of videos increased from 5 videos to 25 videos. In all these cases, the videos can be short (i.e., 10 segments), medium (i.e., 100 segments) or long (i.e., 1000 segments). In order to have a fair comparison among all the three video length cases, we fixed the capacity of each for g node equal to 20% of all the segments of all the videos present in the system.
  6.1. Variable Users
The first set of results refer to the scenario where the users are changed. In addition, all three different length types of videos were considered. Three type of results were considered for testing the performance of the proposed solutions. In the first one, represented in 
Figure 2, the average downloading time for the whole videos is represented. It is possible to notice that, as expected, the longer the videos are, the higher the downloading time is. The quality downgrade has, in general, the lowest downloading time, except for the case of a reduced number of users. Despite this seeming a good result, we have to stress that in the downgrade condition, we would like to minimize the downloading time at the cost of reducing the quality of the videos requested. It is also interesting to notice that the optimal buffering has a video downloading time that is higher than the marginal gain maximization. Additionally, for this case, as becomes more clear by looking at the following figures, we have to consider that in optimal buffering, we would prefer to respect the buffering constraints and not merely minimize the marginal gain.
Indeed, by analyzing the probability of not respecting the downloading time constraint, i.e., incurring in a potential under-buffering, as depicted in 
Figure 3, it is possible to notice that the Optimal buffering allows the outage probability to be reduced as much as possible. As we can notice, once the number of users increases, there is some possibility that the quality downgrade performs better. However, once again, we have to stress that in this case, the quality is reduced, while in all the other cases, the quality is always that requested by the users. Regarding the marginal gain, we can notice that it is able to perform better than the static allocation, while the optimal buffering has better performance.
As a third result, we show the utility function (
Figure 4). In this case, we cannot be surprised. The downgrade reduces the quality requested by the users; hence, the related utility function is lower, while in all the other cases, the quality is not changed and the same result occurs.
Finally we show the probability of quality downgrade events. They are calculated as the quality downgrade levels for each transmitted segment by the whole number of segments. Here everything become clear by noticing that the quality downgrade time is reduced at the cost of having not respected the requested quality. In particular, when the number of users is high, several segments do not respect the requested quality even if the probability of delivering them in time is higher, as previously shown. The values are reported in 
Table 2.
  6.2. Variable Nodes
Similar results can be seen in 
Figure 5, 
Figure 6 and 
Figure 7. As expected, when the number of nodes increases, the performance becomes better. Similar to the previous case, the optimal buffering algorithm allows us to minimize the outage probability, and despite having an overall video download time higher the the rest, it is sufficient for avoiding under-buffering. The marginal gain maximization allows again to trade-off by reducing the download time while having less outage than the static allocation.
A similar trend can be also seen by analyzing 
Table 3, where we report the probability of downgraded quality levels. As can be seen, the downgrade events are still significant and are needed for keeping the downloading time low. Additionally, when the number of nodes increases, the downgrade events decrease, gaining from the presence of multiple nodes and hence reducing the probability of requesting segments to the cloud.
  6.3. Variable Videos
Finally, we considered the performance in the case that the number of videos stored in the system changes (
Figure 8, 
Figure 9 and 
Figure 10). In this case, we can notice that the downloading performance is quite flat for all the algorithms, except the marginal gain maximization. This is due to the fact that when the number of videos increases, the marginal gain can better exploit the reduced ration in terms of users per video, hence decreasing the time needed for downloading them. Similar results to the previous cases can be seen when analyzing both the outage probability and the utility Function.
A similar trend can also be seen by analyzing 
Table 4, where we report the probability of downgrades in quality level. As can be seen, the downgrade events are still significant and are needed for keeping the downloading time low. Additionallywhen the number of videos increases the downgrade events decreases, gaining from the reduced number of users request per video, allowing better allocation for each of them.