Transitioning Broadcast to Cloud

Reznik, Yuriy; Cenzano, Jordi; Zhang, Bo

doi:10.3390/app11020503

Open AccessArticle

Transitioning Broadcast to Cloud

by

Yuriy Reznik

,

Jordi Cenzano

and

Bo Zhang

^*

Brightcove Inc., 290 Congress Street, Boston, MA 02210, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(2), 503; https://doi.org/10.3390/app11020503

Submission received: 13 November 2020 / Revised: 23 December 2020 / Accepted: 28 December 2020 / Published: 6 January 2021

(This article belongs to the Special Issue Multimedia Cloud Computing)

Download

Browse Figures

Versions Notes

Abstract

:

We analyze the differences between on-premise broadcast and cloud-based online video delivery workflows and identify technologies needed for bridging the gaps between them. Specifically, we note differences in ingest protocols, media formats, signal-processing chains, codec constraints, metadata, transport formats, delays, and means for implementing operations such as ad-splicing, redundancy and synchronization. To bridge the gaps, we suggest specific improvements in cloud ingest, signal processing, and transcoding stacks. Cloud playout is also identified as critically needed technology for convergence. Finally, based on all such considerations, we offer sketches of several possible hybrid architectures, with different degrees of offloading of processing in cloud, that are likely to emerge in the future.

Keywords:

broadcast; video encoding and streaming; cloud-based services; cloud playout

1. Introduction

Terrestrial broadcast TV has been historically the first and still broadly used technology for delivery of visual information to the masses. Cable and DHT (direct-to-home) satellite TV technologies came next, as highly successful evolutions and extensions of the broadcast TV model [1,2].

Yet, broadcast has some limits. For instance, in its most basic form, it only enables linear delivery. It also provides direct reach to only one category of devices: TV sets. To reach other devices, such as mobiles, tablets, PCs, game consoles, etc., the most practical option currently available is to send streams Over the Top (OTT). The OTT model utilizes IP-connections that many of such devices already have, and internet streaming as a delivery mechanism [3,4,5,6,7]. The use of OTT/streaming also makes it possible to implement interactive, non-linear, time-shifted TV, or DVR types of services.

Considering all such benefits and conveniences, many companies in the broadcast ecosystem are now increasingly adding OTT services, complementing their traditional (e.g., terrestrial, cable, satellite) services or distribution models [8,9,10,11]. At a broader scale, we must also recognize new standards and industry initiatives such as HbbTV [12], as well as ATSC 3.0 [13,14], which are further blurring the boundaries between traditional broadcast and OTT.

In other words, we are now living in an era where hybrid broadcast + OTT distribution becomes a norm, and this brings us to a question of how such hybrid systems can be deployed and operated most efficiently?

At present time, the two extreme choices are:

on-prem: everything, including playout systems, encoders, multiplexers, servers, and other equipment for both broadcast and OTT distribution is installed and operated on premises, and
cloud-based: almost everything is turned into software-based solutions, and operated using infrastructure of cloud service providers, such as AWS, GCP, Azure, etc.

On-prem model is indeed well-known. This is how all traditional broadcast systems have always been built and operated. Cloud-based approach is a more recent development. It requires considerably different (modular, software only) implementation of the system, but in the end, it brings a number of significant advantages: it minimizes investments in hardware, allows pay-as-you-go operation, simplifies management, upgrades, makes whole design more flexible and future-proof, etc. [15,16,17].

Furthermore, the use of cloud has already been proven to be highly scalable, reliable, and cost-effective for implementing OTT/streaming delivery. Today, cloud already powers mass-scale online video services, such as YouTube and Netflix, as well as online video platforms (OVPs)—Brightcove, Kaltura, thePlatform, etc. [9,10,11]. Besides enabling basic streaming functionality, OVPs also provide means for content management, ad-insertions, analytics, client SDKs, and even automatic generators of apps for all major platforms. They basically provide turn-key solutions for OTT services of all sorts.

However, while transition of OTT services to cloud is no longer a challenge, the offload of traditionally on-prem functions of broadcast systems, such as ingest, content management, master control/playout, distribution encoding, etc., is a topic that we believe deserves additional discussion. As we will show in this paper, there are many important differences in ways video processing is currently done in cloud vs. on-prem broadcast, as well as technologies that may be needed to bridge the gaps between them.

2. Processing Chains in Broadcast and Cloud-Based Online Video Systems

In this section, we will study commonalities and differences between processing chains in broadcast and online video systems. We focus on functions, formats, means for implementation of certain operations, and overall system characteristics such as reliability, processing granularity, and delays.

The idealized chain of processing in broadcast distribution is shown in Figure 1, and the chain of processing in online video system is shown in Figure 2. Both are indeed conceptual and high-level.

2.1. Main Functions and Distribution Flows

In broadcast, everything is based around processing and delivery of a set of live streams, visible to end users as “TV channels”. As shown in Figure 1, the selection or scheduling of input feeds that go in each channel is done by master control or playout systems. Such systems also insert graphics (e.g., channel logos or “bugs”), slots for ads, captions, metadata, etc.

After playout, all channels are subsequently encoded and passed to a multiplexer, which combines them in a multi-program transport streams (aka TS or MPEG-2 TS [17]) intended for distribution. In addition to channel’s media content, the final multiplex TS also carries program and system information (PSIP [18]), SCTE-35 ad markers [19,20], and other descriptors required for broadcast distribution [21].

As further shown in Figure 1, the distribution chain in broadcast systems may have multiple tiers—from main network center to local stations and also multichannel video programming distributors (MVPDs), such as cable or satellite TV companies. At each stage, media streams corresponding to each channel can be extracted, modified (e.g., by adding local content or ads), re-multiplexed into a new set of channels, with new program tables and other metadata inserted, and then again sent down to distribution or next headend.

In other words, we see that broadcast systems are effectively responsible for both formation of the content, turning it into to a set of channels, and then distribution of content to the end users.

In contrast, online video platforms are used primarily for distribution. They assume that content is already fully formed. As shown in Figure 2, live inputs are typically turned into live output streams, and pre-recorded media files are typically published as VOD assets. They transcode and repackage inputs into HLS [5], DASH [6], or MSS [7] streaming formats, and then pass them to Content Delivery Networks (CDNs) for propagation and delivery to end user devices (streaming clients). In some cases, OVPs may also be used for live to VOD conversions and VOD content management, but not for formation of live streams.

Another important difference between online video systems and broadcast is the availability of the feedback chain. In Figure 2 it is shown by contour arrows connecting players and CDNs to the analytics module within OVP. This module collects playback and CDN usage statistics, and turns them into metrics used for ad-monetization/billing, operations-control, and optimization purposes [22,23].

2.2. Contribution and Ingest

In broadcast, live input streams (or “feeds”) originate from remote or field production. They are normally encoded by a contribution encoder, and delivered to broadcast center over a certain physical link (dedicated IP, satellite, 4G, etc.). The encoding is always done using one of the standard TV formats (e.g., 480i SD or 1080i HD), and with a number of codec- and TS-level constraints applied, making such streams compatible with broadcast systems [24,25,26,27,28,29,30,31,32]. When streams are sent over IP, real-time UDP (User Datagram Protocol)-based delivery protocols are normally used. Examples of such protocols include RTP [33], SMPTE 2022-1 [34], SMPTE 2022-2 [35], Zixi [36], etc.

Pre-recorded content usually comes in form of files, produced by studio encoders. Again, only standard TV/broadcast video formats are used, and specific codec- and container-level restrictions are applied (see e.g., [37]). Moreover, in most cases, the contribution (or so-called “mezzanine”) encodings are done at rates that are considerably higher than rates used for final distribution. This allows broadcast systems to start with “cleaner” versions of the content.

In case of online video platforms, input content generally comes from a much broader and more diverse set of sources—from professional production studios and broadcast workflows to user-generated content. Consequently, the bitrates, formats, and quality of such streams can also vary greatly. This forces OVPs to be highly versatile, robust, and tolerant on the ingest end.

The quality of links used to deliver content to OVPs may also vary greatly. From dedicated connections to datacenters, to public Internet over some local ISPs. UDP may or may not be available.

In such a context, the live ingest protocol that become most commonly used, remarkably enough, is RTMP (Real-Time Messaging Protocol) [38]. This is an old, Flash-era protocol, with many known limitations, but it works over TCP (Transmission Control Protocol), and remains a popular choice for live to cloud ingest. Other protocols for live ingest include SRT (Secure, Reliable Transport) [39] and RIST (Reliable Internet Stream Transport) [40].

2.3. Video Formats and Elementary Streams

We next look at characteristics of video formats used in both broadcast and online video systems. The summary of this comparison is provided in Table 1. For simplicity, in this comparison, we only consider SD and HD TV systems.

As shown in this table, SD systems almost always use interlaced, bottom-field first (bff) sampling format [26,30,41]. If the source is progressive (e.g., film) it is typically converted to interlaced form by so-called telecine process [1,2]. HD systems also use interlaced formats, but with top-field first (tff) order. HD systems can also carry progressive formats (e.g., 720p). In contrast, in internet streaming, only progressive formats are normally used [42,43,44].

In terms of color parameters and Sample Aspect Ratios (SARs), streaming video formats are well aligned with HDTV systems. On the other hand, streaming of SD content requires both color- and SAR-type conversions.

The primary reason why streaming systems are more restrictive is compatibility with wide range of possible receiving devices—mobiles, tablets, PCs, etc. [42]. In many such devices, graphics stacks are simply not designed to properly render interlace, or colors other than sRGB/ITU-R BT.709, or pixels that are non-square. This forces color-, temporal sampling type-, and SAR-type conversions.

In Table 2, we further analyze characteristics of encoded video streams (or elementary streams) used for broadcast and streaming distribution. Again, consideration is limited to SD and HD systems.

First, we notice that the number of encoded streams is different. In broadcast, each channel is encoded as a single stream. In streaming, each input is encoded into several output streams with different resolutions and bitrates. This is needed to accommodate adaptive bitrate (ABR) delivery.

There are also differences in codecs, encoding modes, and codec constraints. For example, in broadcast, the use of constant bitrate (CBR) encoding is most common [24]. It forces codec to operate at a certain target bitrate, matching the amount of channel bandwidth allocated for a particular channel. The use of variable bitrate (VBR) encoding in broadcast is rare and only allowed in so-called statistical multiplexing (or statmux) regime [59,60], where the multiplexer is effectively driving dynamic bandwidth allocation across all channels in a way that the total sum of their bitrates remains constant. In streaming, there is no need for CBR or statmux modes. All streams are typically VBR-encoded with some additional constraints applied on decoder buffer size and maximum bitrate [44].

Significantly different are also GOP (Group of Pictures) lengths. In broadcast, GOPs are typically 0.5 s, as required for channel switching. In streaming, GOPs can be 2–10 s long, typically limited by lengths of segments used for delivery.

Broadcast streams also carry more metadata. They typically include relevant video bitstream verifier (VBV) [54] or hypothetical reference decoder (HRD) parameters [55], picture structure-, picture timing-, and colorimetry-related information [55]. They also carry CEA 608/708 closed captions [61,62] and active format descriptor (AFD)/bar data information [63,64,65]. For streaming most may be omitted.

Finally, there are also important differences in pre-processing. Broadcast encoders are famous for the use of denoisers, MCTF-filters, and other pre-processing techniques applied to make compression more efficient [24,53]. In streaming, the use of such techniques is only beginning to emerge.

2.4. Distribution Formats

As mentioned earlier, in broadcast, distribution is always done using MPEG-2 transport streams [17]. They carry audio and video elementary streams, program and system information [18], SCTE-35 ad markers [19], and other metadata as prescribed by relevant broadcast standards and guidelines [21,25,27,30]. TS in cable systems may also carry EBPs and other cable-specific metadata.

In streaming, things are more diverse. There are several streaming formats and standards currently in use. The most prevalent ones, as of time of writing, are:

HTTP Live Streaming (HLS) [5],
Dynamic Adaptive Streaming over HTTP (DASH) [6], and
Microsoft Smooth Streaming (MSS) [7].

There are also several different types of digital rights management (DRM) technologies. The most commonly used ones are:

FairPlay [66],
PlayReady [67], and
Widevine Modular [68].

The support for these technologies varies across different categories of receiving devices. Hence, in order to reach all devices, multiple streaming formats and combinations of formats and DRM technologies must be supported. We show few common choices of such combinations in Table 3.

HLS, DASH, as well as MSS use multi-rate, segment-based representation of media data. Original content is encoded at several different resolutions and bitrates, and then split in segments, each starting at GOP boundary, such that they can be retrieved and decoded separately. Along with media segments (either in TS [17], ISOBMFF [69], or CMAF [70] formats) additional files (usually called manifests, playlists, or MPD (Media Presentation Descriptor) files) are provided, describing locations and properties of all such segments. Such manifests are used by players (or streaming clients) to retrieve and play the content.

The carriage of metadata in streaming systems is also more diverse. Some metadata can be embedded in media segments, while others may also be embedded in manifests, carried as additional “sidecar” tracks of segment files, or as “event” messages [6], or ID3 tags [71].

For example, in addition to “broadcast-style” carriage of CEA 608/708 [62,63] closed captions in video elementary streams, it is also possible to carry captions as separate tracks of WebVTT [72] or TTML [73] segments, or as IMSC1 timed text data [74] encapsulated in XML or ISOBMFF formats [45]. The preferred way of carriage depends on player capabilities, and may vary for different platforms.

The SCTE-35 information is allowed to be carried only at manifest level in HLS, by either manifest of in-band events in MPEG-DASH, and only in-band in MSS [7,45,75].

To manage such broad diversity of formats, DRMs, and metadata representations, online video platforms are commonly deploying so-called dynamic or just-in-time (JIT) packaging mechanisms [23]. This is illustrated by an architecture shown in Figure 2. Instead of proactively generating and storing all possible permutations of packaged streams on origin server, such system stores all VOD content in a single intermediate representation, that allows fast transmux to all desired formats. The origin server works as a cache/proxy, invoking JIT transmuxers to produce each version of content only if there is a client device that requests it. Such logic is commonly accompanied by dynamic manifest generation, matching the choices of formats, DRMs, and metadata representation to capabilities of devices requesting them. This reduces amount of cloud storage needed and also increases the efficiency of use of CDNs when handing multiple content representations [23].

As easily observed, delivery formats and their support system in case of OTT/streaming is completely different as compared to broadcast.

2.5. Ad Processing

In broadcast systems there are several types of ad slots, where some are local and anticipated to be filled by local stations, and some are regional or global and are filled earlier in the delivery chain.

In all cases, insertions are done by splicing ads in the distribution TS streams, aided by SCTE-35 [19] ad markers. Such markers (or cue tones) are inserted earlier—at playout or even production stages [20]. Ad splicers subsequently look for SCTE-35 markers embedded in the TS, and then communicate with ad servers (normally over SCTE 30 [76]) to request and receive ad content that needs to be inserted. Then they update TS streams to insert segments of ad content. Such TS update is actually a fairly tedious process, involving re-mux, regeneration of timestamps, etc. It also requires both main content and ads to be consistently encoded: have the same exact codec parameters, HRD model, etc. (see e.g., [77]).

In the online/streaming world, ad-related processing is quite different. The ads are usually inserted/personalized on a per-stream/per-client basis, and the results of viewers watching the ads (so-called ad-impressions) are also registered, collected, and subsequently used for monetization. It is all fully automated and has to work in real-time and at mass scale.

There are two models for ad-insertion that are used in streaming currently: server-side ad-insertion (SSAI) and client-side ad insertion (CSAI) [45]. In case of CSAI, most ad-related processing resides in a client. The cloud only needs to deliver content and SCTE-35 cue tones to the client. This scales well regardless of how cue tones are delivered—both in-band, or in-manifest carriage methods are adequate.

In case of SSAI, most ad-related processing resides in cloud. To operate it at high scale and reasonable costs, such processing has to be extremely simple. In this context, in-manifest carriage of SCTE-35 cue tones is strongly preferred, as it allows ad-insertions to be done by manipulation of manifests.

For example, in case of HLS, SCTE-35 markers in HLS playlists become substituted with sections containing URLs to ad-content segments, with extra EXT-X-DISCONTINUITY markers added at beginning and end of such sections [75]. In case of MPEG DASH, essentially the same functionality is achieved by using multiple periods [45]. The discontinuity markers or changing periods are effectively forcing clients to reset decoders when switching between program and ad content. This prevents possible HRD buffer overflows and other decodability issues during playback.

2.6. Delay, Random Access, Fault Tolerance, and Signal Discontinuities

In broadcast systems, many essential signal processing operations—format conversions, editing, switching, etc., are normally done with uncompressed video streams, carried over by SDI (Serial Digital Interface) [78,79], or more recently by SMPTE (Society of Motion Picture and Television Engineers) 2110 [80] over IP. This enables all such operations to be performed with extremely short delays and with frame-level temporal precision. When redundant processing chains are employed, the switching between them in SDI domain also happens seamlessly. When streams are encoded, this increases random access granularity to about 0.5 s, which is a typical GOP length in broadcast streams.

In streaming, as discussed earlier, the delivery of encoded videos to clients is done using segments. Such segments cannot be made arbitrarily small due to CDN efficiency reasons. In practice, 2-, 6-, and 10-s segments are most commonly used [44]. Same segmented media representations are also commonly used internally in cloud-based processing workflows. This simplifies exchanges, avoids additional transcoding or transmuxing operations, and reduces many basic stream-level operations to manifest updates. However, such design also makes random access and delay capabilities in cloud video systems much worse compared to broadcast.

What also makes things in cloud complicated, is a distributed and inhomogeneous nature of processing resources. For instance, physical servers (or cloud “instances”) responsible for running video processing tasks may be located in different datacenters, have somewhat different characteristics of hardware, non-synchronized local clocks, etc. The network-induced delays in accessing such instances may also be different. Processing jobs thus have to be scheduled dynamically and in anticipation of all such possible differences. Moreover, occasionally cloud instances may become unstable, non-responsive, or terminated by the cloud service provider. These are rare, but relatively “normal” events. Cloud workflows must be designed to be “immune” to such events.

To illustrate how fault tolerance in cloud may be achieved, in Figure 3 we show an example of a live streaming system with 2-way redundancy introduced. There are two contribution feeds, marked as A and B respectively, and two processing chains, including ingest, transcoding, and packaging stages. The outputs of packagers are DASH or HLS media segments and manifests. This system also deploys two redundancy control modules. These modules check if manifest and segments’ updates along route A or B are arriving at expected times, and if so they just leave manifests unchanged. However, if they detect that either of these processing chains become non-functional, they update manifest to include a discontinuity marker, and then continue with segments arriving from an alternative path.

As easily observed, this system remains operational in case if either of the chains A or B fails. It also stays operational in case of failure of one of the redundancy control units. However, what is important to note, is that in case of a failure, the switch between videos in chains A and B may not be perfectly time-aligned. The output stream will be decodable, but it may exhibit time-shift discontinuity in media content at time of switch/fallback. This comes as a result of operation in a distributed system with variable delays and different processing resources that may be utilized along chains A and B. Naturally, with some additional effort, the magnitude of such misalignment could be minimized, but that will necessarily increase complexity and delay of the system. Perfect synchronization, in principle, is one of the most challenging problems in the cloud.

The observed differences in delays, random access granularity, and also possible discontinuities in signals coming from cloud-based workflows are among most critical factors that must be considered in planning migration of signal processing functionality in cloud.

3. Technologies Needed to Support Convergence

We next discuss measures that we believe must be taken to make cloud-based video platforms more compatible with broadcast systems.

3.1. Cloud Contribution Links and Protocols

As mentioned earlier, cloud-based video platforms typically use RTMP [38] as a protocol for live ingest. It is an old, Flash-era protocol, performing internal demux and carriage of audio and video data as separate streams sent over TCP [38]. It has no control over latencies, alters PTS (Presentation Timestamp)/DTS (Decoding Timestamp) timestamps, and makes it very difficult to carry SCTE-35 and other important metadata. In other words, it is basically inadequate for integration with broadcast workflows.

Things can be done better. Nowadays most cloud systems can accept UDP traffic, enabling the use of protocols such as RTP [33], RTP+SMPTE 2022-1 [34], RTP+SMPTE 2022-2 [35], Zixi [36], SRT (Secure, Reliable Transport) [39] or RIST (Reliable Internet Stream Transport) [40]. Such protocols can carry unaltered transport streams from contribution encoders or broadcast workflows to the cloud. Some of these protocols can also be used to send information back from cloud to the broadcast systems.

What also ultimately helps with achieving reliable ingest (as well as all other exchanges between broadcast on-prem systems and cloud) is the use of dedicated connections to cloud datacenters. Such dedicated links can be established with most major cloud operators (see e.g., AWS Direct Connect [81], or Azure ExpressRoute [82]).

3.2. Signal Processing

As also mentioned earlier, broadcast workflows may carry videos in progressive, interlaced, or telecine formats. Field orders and pulldown patterns may also differ across different sources. When such signals are then edited or combined together, this produces output with changing temporal sampling type. If such videos are then encoded and delivered as interlaced—they may still look ok on traditional TV sets. However, if one receives such interlace-encoded signals in and then “naively” tries to convert them to progressive, the results can be disastrous, e.g., a wrong assumption about field order can make videos look jerky, lack of detection of 3:2 pulldowns can produce periodic garbled frames, etc.

What also makes broadcast signals difficult to work with are accumulations of compression and conversion artifacts. The further down the chain the signal is obtained, the mode “noisier” it becomes.

To work with such complex signals, a proper processing stack is needed. One possible architecture is illustrated in Figure 4. It includes a content analysis module, which performs detection of segment cuts and identifies types of temporal sampling patterns and artifacts in each segment. Such information, along with bitstream metadata is then passed to a chain of filters, including artifact removal, temporal sampling conversion, color space conversion, and scaling filters.

The artifact removal filters, such as deblocking and denoising operations are among most basic techniques needed to work with broadcast signals. Deblocking filters are needed, e.g., in working with MPEG-2 encoded content, as MPEG-2 codec [54] does not have in-loop filters, and passes all such artifacts to the output. In Figure 5, we show how such artifacts look, along with cleaned output produced by our deblocking filer. Denoising is also needed, especially when working with older (analog-converted) SD signals. Removal of low-magnitude noise not only makes signal cleaner, but also makes the job of the subsequent encoder easier, enabling it to achieve better quality or lower rate. We illustrate this effect in Figure 6.

Temporal sampling conversion filter in Figure 4 performs conversions between progressive, telecine, and interlace formats, as well as temporal interpolation and resampling operations. As discussed earlier, this filter is driven by information from the content analysis module. This way, e.g., telecine segment can be properly converted back to progressive, interlaced, properly deinterlaced, etc.

The quality of temporal sampling conversion operations is very critical. For example, in Figure 7, we show the outputs of a basic deinterlacing filter (FFMPEG “yadif” filter [83]) and more advanced optical-flow-based algorithm [84]. It can be seen that a basic deinterlacer cannot maintain continuity of field lines under high motion. The effects of such nature can be very prominent in sports broadcast content.

The use of subsequent filters in Figure 5, such as color space conversion and scaling filters, is driven by possible differences in color spaces, SARs, and resolutions in input and output formats.

All such conversion operations need to be state of the art. Or at least they must be comparable in quality with Teranex [85], Snell–Willcox/Grass Valley KudosPro [86], and other standards converter boxes commonly used in post-production and broadcast.

3.3. Broadcast-Compliant Encoding

As discussed earlier, broadcast and streaming workflows use encoders that are significantly different in their feature sets and tuning/stream constraints. Perhaps the most extreme example of such differences is a statmux regime, where encoders are operating under control of a multiplexer—an operating regime that has no parallel in streaming.

Consequently, if cloud workflows are intended to be used for producing streams going back to broadcast distribution, the tuning or upgrade of existing cloud encoders will be needed. For implementation of statmux, the multiplexer should also be natively implemented in cloud and integrated with encoders.

3.4. Cloud Playout

The last, and most important technology that is needed to enable convergence is a high-quality, cloud-based implementation of a playout system.

The design of such a system is a non-trivial task. As we discussed earlier, current cloud-based video workflows typically use HLS/DASH-type segmented media formats, causing them to operate with significant delays and random-access limitations. One cannot build a broadcast-grade playout system based on such architecture. Even so-called ultra-low-delay versions of HLS, DASH, or CMAF [87,88,89] are inadequate. For most master control operations, such as previews, non-linear editing, switching, etc., frame-level access accuracy is an essential requirement.

In Figure 8, we show one possible cloud playout system architecture that can be suggested. To enable frame-level random access this system uses an internal Intra-only mezzanine format. Such a format could use any image or video codec operating in Intra-coding mode, along with PCM (Pulse-Code Modulation) audio, and index enabling access to each frame. Both input live feeds and pre-recorded content are then converted in such internal mezzanine format and placed on cloud storage. All subsequent operations, such as previews, non-linear editing, as well as selection and mix of content producing channel outputs, are done by accessing media in such mezzanine format. The final stream selections, addition of logos, transitions, etc. are done by “stream processor” elements.

In addition to enabling frame-level-accurate processing operations, the use of Intra-only mezzanine format also minimizes impacts of possible failures in the system. All signal processing blocks shown in Figure 8 can be run in a redundant fashion, with checks and switches added to ensure frame-accurate fault-tolerance.

4. Transitioning Broadcast to Cloud

In this section, we discuss possible ways how broadcast and cloud-based video workflows may evolve in the future. We offer three examples of possible hybrid architectures, with different degrees of migration of processing to cloud.

4.1. Cloud-Based OTT Systems

In Figure 9, we show a hybrid architecture where cloud is used only to implement OTT/streaming services. Everything else stays on prem. This is the easiest possible example of integration.

To route streams to cloud, broadcast workflow produces contribution streams, one for each channel, and then sends them over IP (e.g., using RTP+FEC or SRT) to cloud.

The cloud platform receives such streams, performs necessary conversions, transcodes them, and distributes them over CDNs to clients. As shown in Figure 9, the cloud platform may also be used to implement DVR or time-shift TV-type functionality, DRM protection, SSAI, analytics, etc. All standard techniques for optimizing multi-format/multi-screen streaming delivery (dynamic packaging, dynamic manifest generation, optimized profiles, etc. [23]) can also be employed in this case.

To make such system work well, the main technologies/improvements that are needed, include:

reliable real-time ingest, e.g., using RTP+FEC, SRT, RIST, or Zixi-type of protocols and/or a dedicated link, such as AWS Direct connect [81];
improvements in signal processing stack—achieving artifact-free conversion of broadcast formats to ones used in OTT/streaming;
improvements in metadata handling, including full pass-through of SCTE-35 and compliant implementation of SSAI and CSAI functionality based on it.

Generally however, hybrid architectures of this kind have already been deployed and proven to be effective in practice. Some of the above-mentioned close-gap technologies have also been implemented. For instance, cloud ingest using RTP, SMPTE 2022-1, SMPTE 2022-2, or SRT, improvements in support of SCTE-35 for ad-insertions, and improvements in encoding stack were among recent updates in Brightcove VideoCloud system [90].

4.2. Cloud-Based Ingest, Playout, and OTT Delivery System

In Figure 10, we show a more advanced architecture, in which not only OTT delivery, but also ingest, media asset management, and playout are offloaded to cloud.

As it can be immediately grasped, the move of playout functionality to the cloud also enables the use of the cloud platform for ingest. This is particularly helpful on a global scale, as major cloud systems have data centers in all major regions, and so the contribution link is only needed to deliver content to the nearest local datacenter. Media asset management also naturally moves to cloud in this case.

With cloud-based playout, there will still be a need in a control room with monitors, switch panels, etc., but it all will be reduced to a function of a thin client. All storage, redundancy management, media processing, etc., will happen in cloud, significantly reducing required investments in hardware and operating costs.

In the system depicted in Figure 10, the broadcast distribution encoding, multiplexer and all subsequent operations stay on prem without any changes. This way, broadcasters can operate all current ATSC equipment until ATSC 3.0 matures or there is some other serious need to replace it. This is another hybrid cloud + on-prem architecture, which we believe will make sense in practice.

To make such system work, in addition to all improvements mentioned earlier, what further needed is:

cloud-based broadcast-grade playout system,
direct link connection to cloud ensuring low latency monitoring and real-time responses in operation of cloud playout system,
improvements in cloud-run encoders, specifically those acting as contribution transcoders sending broadcast-compliant streams back to the on-prem system.

4.3. Cloud-Based Broadcast and OTT Delivery System

Finally, in Figure 11, we show an architecture, where pretty much all signal processing, transcoding, and multiplexing operations are moved to cloud.

In addition to running playout, this system also runs broadcast transcoders and the multiplexer in cloud. The final multiplex TS is then sent back to on-prem distribution system, but mostly only to be relayed to modulators and amplifiers or (via IP or ASI) to next tier stations or MVPD headends.

To make this system work, in addition to all improvements mentioned earlier, what further needed is:

broadcast-grade transcoders and multiplexer should be natively implemented in cloud
this includes implementation of statmux capability, generation and insertion of all related program and system information, possible addition of datacast service capability, etc.

This architecture is indeed an extreme example, where pretty much all data- and processing- intensive operations are migrated to cloud. It is most technically challenging to implement, but is also most promising, as it enables best utilization of cloud and appreciation of all benefits that it brings.

4.4. Comparison of the Proposed Systems

In Table 4 we present comparison of features of the proposed hybrid architectures relative to the pure on-prem broadcast + OTT delivery solution.

The last row of this table offers estimates of relative costs of physical equipment required by each system. Such estimates are based on the assumption that the OTT chain delivers about the same number of channels and handles a comparable volume of delivered content as the broadcast chain. It can be observed that even the simplest combined solution (Architecture 1) could potentially eliminate up to 30–50% in equipment costs. With more evolved architectures the investment in physical equipment can be reduced further down to about 10–20%.

5. Scale and Performance of Cloud-Based Media Delivery Systems

In this section, we present a set of field results demonstrating scalability and performance of today’s cloud-based OTT media delivery systems. As examples, we will take data observed for several randomly chosen media distributors using the Brightcove OVP system [90]. The data were captured during the period of 19 December–22 December, 2020.

First, in Figure 12 we show the numbers of concurrent live streaming encoders run by different users of cloud-based OVP. Such numbers can be very different, from single encoders/channels for some users, to several hundreds for others. The overall number of live transcoders run by this set of users of OVP was close to 1000. This compares well to typical numbers of channels supported in today’s broadcast systems.

Figure 12 also shows that the numbers of concurrent live encoders in cloud-based system may change over time. There were periods when the system scaled them up, and there were periods when it scaled them down. Such fluctuations generally related to either creation of new or termination of existing live streaming events, or management of redundant streams or management of a pool of “available” encoders that were instantiated speculatively to enable new streams to be launched and streamed right away, without any additional delays.

The dynamics of creation of new live streams in this system are further illustrated in Figure 13. Here, we also observed high variation. From a single job to almost 100 jobs can be created at each instance of time. This indicates that composition of work processed by this system includes not only constantly-running 24/7 channels, but also a good volume of special events (e.g., concerts, sports events, etc.), triggering creations of new live delivery chains. Cloud-based architectures are highly suitable for handing variable workload of this kind.

Next, in Figure 14 we show CDN bandwidth statistics reported for same set of users of cloud-based delivery platform. The graphs show the amounts of data delivered at CDN edge, as well as data pulled from origin servers, and the amounts of CDN midgress traffic. Here, we also observed significant variations. Around 12/20 8:00 to 11:00 (US eastern standard time) we saw a cascade of two significant spikes in traffic. The volume of data increased almost 10× during this period. However, we also noticed, that the amount of origin traffic during the same period did not increase much. This illustrates that CDNs managed to absorb these spikes in traffic successfully. More generally, however, the amount of orgin traffic may also fluctuate significantly. To support such a variable load, cloud-based delivery platforms provide means for auto-scaling and balancing of load on origin servers.

Finally, in Figure 15, we also plot the numbers of concurrent players that are pulling live streaming content. It can be observed that this figure looks somewhat similar to CDN-reported statistics. Around 12/20 8:00 to 11:00 (US eastern standard time) we also saw a cascade of two spikes, where almost 1 M concurrent viewers joined. This explains spikes in CDN traffic noted earlier. Such spikes are pretty common for popular events, and the system is designed to handle them efficiently.

Naturally, the above statistics capture only a small example set of use cases of today’s existing cloud-based OTT delivery platforms. However, they show that such platfoms are operational, mass deployed, and capable of handling volumes of transcoding and media delivery streams comparable to ones used in broadcast distribution systems.

As also shown, the amounts of concurrent live events/streams, transcoders, transmuxers, origins, and other elements of delivery chan can be highly variable in practice. Cloud-based deployments are ideally suited for handling such variablity in resource requirements in cost-efficient manner. Transition of additional components of broadcsast workflows to cloud will lead to additional reduction in investments in hardware, will simplify management and maintenance, and will make overall design of such systems much more flexible, extensible, and future-proof.

6. Conclusions

In this paper, we have studied the differences between on-premise broadcast and cloud-based online video delivery workflows and identified means needed for bridging the gaps between them. Such means include: improvements in cloud ingest, signal processing stacks, transcoder capabilities, and most importantly, a broadcast-grade cloud playout system. To implement a cloud playout system, we have suggested an architecture employing intra-only mezzanine format and associated processing blocks that can be easily replicated and operated in fault-tolerant fashion. We finally considered possible evolutions of broadcast and cloud-based video systems and suggested several possible hybrid architectures, with different degrees of offloading of processing in cloud, that are likely to emerge in the future. Examples of operational statistics observed in today’s mass-deployed cloud-based media delivery systems were also shown. These statistics confirm that such systems can indeed handle the load required for transitioning of additional elements of broadcast systems to cloud.

Author Contributions

Conceptualization, Y.R. and J.C.; methodology, Y.R., J.C. and B.Z.; software, J.C. and B.Z.; validation, Y.R., J.C. and B.Z.; formal analysis, Y.R.; investigation, Y.R. and J.C.; resources, Y.R., J.C. and B.Z.; data curation, B.Z. and Y.R.; writing—original draft preparation, Y.R.; writing—review and editing, J.C. and B.Z.; visualization, Y.R. and B.Z.; supervision, Y.R.; project administration, Y.R.; funding acquisition, Y.R.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pizzi, S.; Jones, G. A Broadcast Engineering Tutorial for Non-Engineers, 4th ed.; National Association Broadcasters (NAB): Chicago, IL, USA, 2014; 354p, ISBN 13:978-0415733380/10:0415733380. [Google Scholar]
Luther, A.; Inglis, A. Video Engineering, 3rd ed.; McGraw-Hill: New York, NY, USA, 1999; 549p. [Google Scholar]
Wu, D.; Hou, Y.T.; Zhu, W.; Zhang, Y.; Peha, J.M. Streaming video over the internet: Approaches and directions. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 282–300. [Google Scholar]
Conklin, G.J.; Greenbaum, G.S.; Lillevold, K.O.; Lippman, A.F.; Reznik, Y.A. Video coding for streaming media delivery on the internet. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 269–281. [Google Scholar] [CrossRef]
Pantos, R.; May, W. HTTP Live Streaming, RFC 8216. IETF. Available online: https://tools.ietf.org/html/rfc8216 (accessed on 1 November 2020).
ISO/IEC 23009-1:2014. Information Technology—Dynamic Adaptive Streaming Over HTTP (DASH)—Part 1: Media Presentation Description and Segment Formats. ISO/IEC, 2014. Available online: https://www.iso.org/about-us.html (accessed on 1 November 2020).
Microsoft Smooth Streaming. Available online: https://www.iis.net/downloads/microsoft/smooth-streaming (accessed on 1 November 2020).
Evens, T. Co-opetition of TV broadcasters in online video markets: A winning strategy? Int. J. Digit. Telev. 2014, 5, 61–74. [Google Scholar] [CrossRef] [Green Version]
Nielsen Holdings Plc, Total Audience Report. 2020. Available online: https://www.nielsen.com/us/en/client-learning/tv/nielsen-total-audience-report-february-2020/ (accessed on 1 November 2020).
Sandvine. The Global Internet Phenomena Report. 2019. Available online: https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/Internet%20Phenomena/Internet%20Phenomena%20Report%20Q32019%2020190910.pdf (accessed on 1 November 2020).
Frost & Sullivan. Analysis of the Global Online Video Platforms Market. Frost & Sullivan. 2014. Available online: https://store.frost.com/analysis-of-the-global-online-video-platforms-market.html (accessed on 1 November 2020).
ETSI TS 102 796. Hybrid Broadcast Broadband TV. ETSI, 2016. Available online: https://www.etsi.org/deliver/etsi_ts/102700_102799/102796/01.04.01_60/ts_102796v010401p.pdf (accessed on 1 November 2020).
ATSC A/331:2020. Signaling, Delivery, Synchronization, and Error Protection. ATSC, 2020. Available online: https://www.atsc.org/atsc-documents/3312017-signaling-delivery-synchronization-error-protection/ (accessed on 1 November 2020).
Stockhammer, T.; Sodagar, I.; Zia, W.; Deshpande, S.; Oh, S.; Champel, M. Dash in ATSC 3.0: Bridging the gap between OTT and broadcast. In Proceedings of the IET Conference Proceedings, IBC 2016 Conference, Amsterdam, The Netherlands, 8–12 September 2016; Volume 1, p. 24. [Google Scholar]
AWS Elemental. Video Processing and Delivery Moves to the Cloud, e-book. 2018. Available online: https://www.elemental.com/resources/white-papers/e-book-video-processing-delivery-moves-cloud/ (accessed on 1 November 2020).
Fautier, T. Cloud Technology Drives Superior Video Encoding; In SMPTE 2019; SMPTE: Los Angeles, CA, USA, 2019; pp. 1–9. [Google Scholar]
ISO/IEC 13818-1:2019. Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Systems—Part 1: Systems. ISO/IEC, 2019; Available online: https://www.iso.org/standard/75928.html (accessed on 1 November 2020).
ATSC A/65B. Program and System Information Protocol for Terrestrial Broadcast and Cable (PSIP). ATSC, 2013. Available online: https://www.atsc.org/wp-content/uploads/2015/03/Program-System-Information-Protocol-for-Terrestrial-Broadcast-and-Cable.pdf (accessed on 1 November 2020).
ANSI/SCTE 35 2007. Digital Program Insertion Cueing Message for Cable. SCTE, 2007. Available online: https://webstore.ansi.org/standards/scte/ansiscte352007 (accessed on 1 November 2020).
ANSI/SCTE 67 2010. Recommended Practice for SCTE 35 Digital Program Insertion Cueing Message for Cable. SCTE, 2010. Available online: https://webstore.ansi.org/standards/scte/ansiscte672010 (accessed on 1 November 2020).
Lechner, B.J.; Chernock, R.; Eyer, M.; Goldberg, A.; Goldman, M. The ATSC transport layer, including Program and System Information (PSIP). Proc. IEEE 2006, 94, 77–101. [Google Scholar] [CrossRef]
Reznik, Y.; Lillevold, K.; Jagannath, A.; Greer, J.; Corley, J. Optimal design of encoding profiles for ABR streaming. In Proceedings of the Packet Video Workshop, Amsterdam, The Netherlands, 12–15 June 2018. [Google Scholar] [CrossRef]
Reznik, Y.; Li, X.; Lillevold, K.; Peck, R.; Shutt, T.; Marinov, R. Optimizing Mass-Scale Multi-Screen Video Delivery. In Proceedings of the 2019 NAB Broadcast Engineering and Information Technology Conference, Las Vegas, NV, USA, 6–11 April 2019. [Google Scholar]
Davidson, G.A.; Isnardi, M.A.; Fielder, L.D.; Goldman, M.S.; Todd, C.C. ATSC Video and Audio Coding. Proc. IEEE 2006, 94, 60–76. [Google Scholar] [CrossRef]
ATSC A/53 Part 1: 2013. ATSC Digital Television Standard: Part 1—Digital Television System. ATSC, 2013. Available online: https://www.atsc.org/wp-content/uploads/2015/03/A53-Part-1-2013-1.pdf (accessed on 1 November 2020).
ATSC A/53 Part 4:2009. ATSC Digital Television Standard: Part 4—MPEG-2 Video System Characteristics. ATSC, 2009. Available online: https://www.atsc.org/wp-content/uploads/2015/03/a_53-Part-4-2009-1.pdf (accessed on 1 November 2020).
ATSC A/54A. Recommended Practice: Guide to the Use of the ATSC Digital Television Standard. ATSC, 2003. Available online: https://www.atsc.org/wp-content/uploads/2015/03/a_54a_with_corr_1.pdf (accessed on 1 November 2020).
ATSC A/72 Part 1:2015. Video System Characteristics of AVC in the ATSC Digital Television System. ATSC, 2015. Available online: https://www.atsc.org/wp-content/uploads/2015/03/A72-Part-1-2015-1.pdf (accessed on 1 November 2020).
ATSC A/72 Part 2:2014. AVC Video Transport Subsystem Characteristics. ATSC, 2014. Available online: https://www.atsc.org/wp-content/uploads/2015/03/A72-Part-2-2014-1.pdf (accessed on 1 November 2020).
ETSI TS 101 154. Digital Video Broadcasting (DVB): Implementation Guidelines for the use of MPEG-2 Systems, Video and Audio in Satellite, Cable and Terrestrial Broadcasting Applications. Doc. ETSI TS 101 154 V1.7.1. Annex, B., Ed.; 2019. Available online: https://standards.iteh.ai/catalog/standards/etsi/af36a167-779e-4239-b5a7-89356c6c2dde/etsi-ts-101-154-v2.6.1-2019-09 (accessed on 1 November 2020).
ANSI/SCTE 43 2015. Digital Video Systems Characteristics Standard for Cable Television. SCTE, 2015. Available online: https://webstore.ansi.org/standards/scte/ansiscte432015 (accessed on 1 November 2020).
ANSI/SCTE 128 2010-a. AVC Video Systems and Transport Constraints for Cable Television. SCTE, 2010. Available online: https://webstore.ansi.org/standards/scte/ansiscte1282010 (accessed on 1 November 2020).
Schulzrinne, H.; Casner, S.; Frederick, R.; Jacobson, V. RTP: A Transport Protocol for Real-Time Applications. RFC 1889. IETF, 1996. Available online: https://tools.ietf.org/html/rfc1889 (accessed on 1 November 2020).
SMPTE ST 2022-1:2007. Forward Error Correction for Real-Time Video/Audio Transport over IP Networks. ST 2022-1. SMPTE, 2007. Available online: https://ieeexplore.ieee.org/document/7291470/versions#versions (accessed on 1 November 2020).
SMPTE ST 2022-2:2007. Unidirectional Transport of Constant Bit Rate MPEG-2 Transport Streams on IP Networks. ST 2022-2; SMPTE, 2007. Available online: https://ieeexplore.ieee.org/document/7291740 (accessed on 1 November 2020).
Zixi, LLC. Streaming Video over the Internet and Zixi. 2015. Available online: http://www.zixi.com/PDFs/Adaptive-Bit-Rate-Streaming-and-Final.aspx (accessed on 1 November 2020).
OC-SP-MEZZANINE-C01-161026. Mezzanine Encoding Specification. Cable Television Laboratories, Inc., 2016. Available online: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=1d76e930-6d98-4de3-89ee-9d0fb4b5292a (accessed on 1 November 2020).
Adobe Systems, Real-Time Messaging Protocol (RTMP) Specification. Version 1.0. 2012. Available online: https://www.adobe.com/devnet/rtmp.html (accessed on 1 November 2020).
Haivision. Secure Reliable Transport (SRT). 2019. Available online: https://github.com/Haivision/srt (accessed on 1 November 2020).
VSF TR-06-1. Reliable Internet Stream Transport (RIST) Protocol Specification—Simple Profile. Video Services Forum. 2018. Available online: http://vsf.tv/download/technical_recommendations/VSF_TR-06-1_2018_10_17.pdf (accessed on 1 November 2020).
OC-SP-CEP3.0-I05-151104. Content Encoding Profiles 3.0 Specification. Cable Television Laboratories, Inc., 2015. Available online: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=c7eb769e-1020-402c-b2f2-d839ee532945 (accessed on 1 November 2020).
Ozer, J. Encoding for Multiple Devices. Streaming Media Magazine. 2013. Available online: http://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=88179&fb_comment_id=220580544752826_937649 (accessed on 1 November 2020).
Ozer, J. Encoding for Multiple-Screen Delivery. Streaming Media East. 2013. Available online: https://www.streamingmediablog.com/wp-content/uploads/2013/07/2013SMEast-Workshop-Encoding.pdf (accessed on 1 November 2020).
Apple. HLS Authoring Specification for Apple Devices. 2019. Available online: https://developer.apple.com/documentation/http_live_streaming/hls_authoring_specification_for_apple_devices (accessed on 1 November 2020).
DASH-IF. DASH-IF Interoperability Points, v4.3. 2018. Available online: https://dashif.org/docs/DASH-IF-IOP-v4.3.pdf (accessed on 1 November 2020).
ETSI TS 103 285 v1.2.1. Digital Video Broadcasting (DVB); MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services Over IP Based Networks. 2020. Available online: https://www.etsi.org/deliver/etsi_ts/103200_103299/103285/01.03.01_60/ts_103285v010301p.pdf (accessed on 1 November 2020).
SMPTE RP 145. Color Monitor Colorimetry. SMPTE, 1987. Available online: https://standards.globalspec.com/std/1284848/smpte-rp-145 (accessed on 1 November 2020).
SMPTE 170M. Composite Analog Video Signal—NTSC for Studio Applications. SMPTE, 1994. Available online: https://standards.globalspec.com/std/892300/SMPTE%20ST%20170M (accessed on 1 November 2020).
EBU Tech. 3213-E. E.B.U. Standard for Chromaticity Tolerances for Studio Monitors. EBU, 1975. Available online: https://tech.ebu.ch/docs/tech/tech3213.pdf (accessed on 1 November 2020).
ITU-R Recommendation BT.601. Studio Encoding Parameters of Digital Television For standard 4:3 and Wide Screen 16:9 Aspect Ratios. ITU-R, 2011. Available online: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf (accessed on 1 November 2020).
ITU-R Recommendation BT.709. Parameter Values for the HDTV Standards for Production and International Programme Exchange. ITU-R. ITU-R, 2015. Available online: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf (accessed on 1 November 2020).
IEC 61966-2-1:1999. Multimedia Systems and Equipment—Colour Measurement and Management—Part 2-1: Colour Management—Default RGB Colour Space—sRGB. IEC, 1999. Available online: https://webstore.iec.ch/publication/6169 (accessed on 1 November 2020).
Brydon, N. Saving Bits—The Impact of MCTF Enhanced Noise Reduction. SMPTE J. 2002, 111, 23–28. [Google Scholar] [CrossRef]
ISO/IEC 13818-2:2013. Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Part 2: Video. ISO/IEC, 2013. Available online: https://www.iso.org/standard/61152.html (accessed on 1 November 2020).
ISO/IEC 14496-10:2003. Information Technology—Coding of Audio-Visual Objects—Part 10: Advanced Video Coding. ISO/IEC, 2003. Available online: https://www.iso.org/standard/37729.html (accessed on 1 November 2020).
ISO/IEC 23008-2:2013. Information Technology –High Efficiency Coding and Media Delivery in Heterogeneous Environments—Part 2: High Efficiency Video Coding. ISO/IEC, 2013. Available online: https://www.iso.org/standard/35424.html (accessed on 1 November 2020).
AOM AV1. AV1 Bitstream & Decoding Process Specification, v1.0.0. Alliance for Open Media. 2019. Available online: https://aomediacodec.github.io/av1-spec/av1-spec.pdf (accessed on 1 November 2020).
CTA 5001. Web Application Video Ecosystem—Content Specification. CTA WAVE, 2018. Available online: https://cdn.cta.tech/cta/media/media/resources/standards/pdfs/cta-5001-final_v2_pdf.pdf (accessed on 1 November 2020).
Perkins, M.; Arnstein, D. Statistical multiplexing of multiple MPEG-2 video programs in a single channel. SMPTE J. 1995, 4, 596–599. [Google Scholar] [CrossRef]
Boroczky, L.; Ngai, A.Y.; Westermann, E.F. Statistical multiplexing using MPEG-2 video encoders. IBM J. Res. Dev. 1999, 43, 511–520. [Google Scholar] [CrossRef]
CEA-608-B. Line 21 data services. Consumer Electronics Association. 2008. Available online: https://webstore.ansi.org/standards/cea/cea6082008r2014ansi (accessed on 1 November 2020).
CEA-708-B. Digital television (DTV) closed captioning. Consumer Electronics Association. 2008. Available online: https://www.scribd.com/document/70239447/CEA-708-B (accessed on 1 November 2020).
CEA CEB16. Active Format Description (AFD) & Bar Data Recommended Practice. CEA, 2006. Available online: https://webstore.ansi.org/standards/cea/ceaceb162012 (accessed on 1 November 2020).
SMPTE 2016-1. Standard for Television—Format for Active Format Description and Bar Data. SMPTE, 2007. Available online: https://www.techstreet.com/standards/smpte-2016-1-2009?product_id=1664006 (accessed on 1 November 2020).
OC-SP-EP-I01-130118. Encoder Boundary Point Specification. Cable Television Laboratories, Inc., 2013. Available online: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=2a4f4cc6-3763-40b9-9ace-7de923559187 (accessed on 1 November 2020).
FairPlay Streaming. Available online: https://developer.apple.com/streaming/fps/ (accessed on 1 November 2020).
PlayReady. Available online: https://www.microsoft.com/playready/overview/ (accessed on 1 November 2020).
Widevine. Available online: https://www.widevine.com/solutions/widevine-drm (accessed on 1 November 2020).
ISO/IEC 14496-12:2015. Information Technology—Coding of Audio-Visual Objects—Part 12: ISO Base Media File Format. 2015. Available online: https://www.iso.org/standard/68960.html (accessed on 1 November 2020).
ISO/IEC 23000-19:2018. Information Technology—Coding of Audio-Visual Objects—Part 19: Common Media Application Format (CMAF) for Segmented Media. 2018. Available online: https://www.iso.org/standard/71975.html (accessed on 1 November 2020).
ID3 Tagging System. Available online: http://www.id3.org/id3v2.3.0 (accessed on 1 November 2020).
W3C WebVTT. The Web Video Text Tracks. W3C, 2018. Available online: http://dev.w3.org/html5/webvtt/ (accessed on 1 November 2020).
W3C TTML1. Timed Text Markup Language 1. W3C, 2019. Available online: https://www.w3.org/TR/2018/REC-ttml1-20181108/ (accessed on 1 November 2020).
W3C IMSC1. TTML Profiles for Internet Media Subtitles and Captions 1.0. W3C, 2015. Available online: https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml-ww-pro-33files/ttml-ww-profiles.html (accessed on 1 November 2020).
Apple. Incorporating Ads into A Playlist. Available online: https://developer.apple.com/documentation/http_live_streaming/example_playlists_for_http_live_streaming/incorporating_ads_into_a_playlist (accessed on 1 November 2020).
ANSI/SCTE 30 2017. Digital Program Insertion Splicing API. SCTE, 2017. Available online: https://webstore.ansi.org/standards/scte/ansiscte302017 (accessed on 1 November 2020).
ANSI/SCTE 172 2011. Constraints on AVC Video Coding for Digital Program Insertion. SCTE, 2011. Available online: https://webstore.ansi.org/preview-pages/SCTE/preview_SCTE+172+2011.pdf (accessed on 1 November 2020).
SMPTE 259:2008. SMPTE Standard—For Television—SDTV—Digital Signal/Data—Serial Digital Interface. SMPTE, 2008. Available online: https://ieeexplore.ieee.org/document/7292109 (accessed on 1 November 2020).
SMPTE 292-1:2018. SMPTE Standard—1.5 Gb/s Signal/Data Serial Interface. SMPTE, 2018. Available online: https://ieeexplore.ieee.org/document/8353270 (accessed on 1 November 2020).
SMPTE 2110-20:2017. SMPTE Standard—Professional Media over Managed IP Networks: Uncompressed Active Video. SMPTE, 2017. Available online: https://ieeexplore.ieee.org/document/8167389 (accessed on 1 November 2020).
AWS Direct Connect. Available online: https://aws.amazon.com/directconnect/ (accessed on 1 November 2020).
Azure ExpressRoute. Available online: https://azure.microsoft.com/en-us/services/expressroute/ (accessed on 1 November 2020).
FFMPEG Filter Documentation. Available online: https://ffmpeg.org/ffmpeg-filters.html (accessed on 1 November 2020).
Vanam, R.; Reznik, Y. Temporal Sampling Conversion using Bi-directional Optical Flows with Dual Regularization. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, UAE, 25–28 October 2020. [Google Scholar]
Teranex Standards Converters. Available online: https://www.blackmagicdesign.com/products/teranex (accessed on 1 November 2020).
Grass Valley KudosPro Converters. Available online: https://www.grassvalley.com/products/kudospro/ (accessed on 1 November 2020).
Apple. Protocol Extension for Low-Latency HLS (Preliminary Specification). Apple Inc., 2020. Available online: https://developer.apple.com/documentation/http_live_streaming/protocol_extension_for_low-latency_hls_preliminary_specification (accessed on 1 November 2020).
DASH-IF. DASH-IF ATSC3.0 IOP. 2012. Available online: https://dashif.org/docs/DASH-IF-IOP-for-ATSC3-0-v1.1.pdf (accessed on 1 November 2020).
Law, W. Ultra Low Latency with CMAF. Available online: https://www.akamai.com/us/en/multimedia/documents/white-paper/low-latency-streaming-cmaf-whitepaper.pdf (accessed on 1 November 2020).
Brightcove VideoCloud System. Available online: https://www.brightcove.com/en/online-video-platform (accessed on 1 November 2020).

Figure 1. Conceptual diagram of processing operations in broadcast distribution.

Figure 2. Conceptual diagram of processing operations in cloud-based online video platform.

Figure 3. An example of cloud-based redundant live streaming workflow.

Figure 4. Decoding and format conversion chain needed to operate with cable/broadcast content.

Figure 5. Example of MPEG2 blocking artifacts (a) and their removal by deblocking filter (b).

Figure 6. Example of using denoising filter for improving quality of final transcoded signal.

Figure 7. Comparison of outputs of basic (a) and advanced (b) deinterlacing filters.

Figure 8. Example architecture of cloud playout system.

Figure 9. Hybrid architecture with Over the Top (OTT)/streaming workflow offloaded to cloud.

Figure 10. Hybrid architecture with ingest, playout and OTT/streaming workflow offloaded to cloud.

Figure 11. Hybrid cloud-based broadcast + OTT system architecture.

Figure 12. The numbers of concurrent live encoders run by different users of cloud-based OVP.

Figure 13. The numbers of new live streams created at each instance of time.

Figure 14. CDN-reported bandwidth statistics.

Figure 15. The numbers of concurrent streaming players pulling the content.

Table 1. Comparison of video formats used in broadcast and Internet streaming.

Format Characteristic	Broadcast Systems	Online Platforms/Streaming
Temporal sampling	SD: interlaced (bff), telecine HD: progressive, interlaced (tff), telecine	progressive only
Framerates [Hz]	24,000/1001, 24, 25, 30,000/1001, 30, 50, 60,000/10,001, 60	same as source, typically capped at 30 Hz or 60 Hz
Display Aspect Ratio (DAR)	SD: 4:3, 16:9 HD: 16:9	same as indicated by the source
Sample Aspect Ratio (SAR)	SD: 1:1, 12:11, 10:11, 16:11, 40:33, 24:11, 32:11, 80:33, 18:11, 15:11, 64:33, 160:99 HD: 1:1 (most common), 4:3 (1440 mode)	1:1 or nearest rounding to 1:1 SARs are usually preferred
Resolutions	SD (NTSC-derived): 480i SD (PAL, SECAM-derived): 576i HD: 720p, 1080i	same as source + down-scaled versions; additional restrictions may apply [45,46]
Chroma sampling	4:2:0, 4:2:2	4:2:0 only
Primary colors	SD (NTSC-derived): SMPTE-C [47,48] SD (PAL, SECAM-derived): EBU [49,50] HD: ITU-R BT.709 [51]	ITU-R BT.709/sRGB [51,52]
Color matrix	SD: ITU-R BT.601 [50] HD: ITU-R BT.709 [51]	ITU-R BT.709
Transfer characteristic	SD (NTSC-derived): power law 2.2 SD (PAL, SECAM-derived): power law 2.8 HD: ITU-R BT.709 [51]	ITU-R BT.709

Table 2. Comparison of encoded video streams used in broadcast and Internet streaming.

Stream Characteristic	Broadcast Systems	Online Platforms/Web Streaming
Number of outputs	single	multiple (usually 3–10) as needed to support ABR delivery [4,42,43]
Preprocessing	denoising, MCTF-type filters [24,53]	rarely used
Video codecs	MPEG-2 [54], MPEG-4/AVC [55]	MPEG-4/AVC—most deployments HEVC [56], AV1 [57]—special cases
Codec profiles, levels	fixed for each format. applicable standards: ATSC A/53 P4 [26], ATSC A/72 P1 [29], ETSI TS 101 154 [30], SCTE 128 [33]	based on target set of devices [42] guidelines: Apple HLS [44], ETSI TS 103 285 [46], CTA 5001 [58]
GOP length	0.5 s	2–10 s
GOP type	open, closed	closed
Error resiliency features	mandatory slicing of I/IDR pictures	N/A
Encoding modes	CBR, VBR (with statmux) many additional constraints apply, see ATSC A/53 P4 [26], ATSC A/54 A [27], ATSC A/72 P1 [29], ETSI TS 101 154 [30], SCTE 43 [31], SCTE 128 [33]	capped VBR max. bitrate is typically capped to 1.1x–1.5x target bitrate; HRD buffer size is typically limited by codec profile+ level constraints;
VUI/HRD parameters	required	optional, usually omitted
VUI/colorimetry data	required	optional, usually included
VUI/aspect ratio	required	optional, usually included
Picture timing SEI	required in some cases (e.g., in film mode)	optional, usually omitted
Buffering period SEI	optional	optional, usually omitted
ADF/bar data/T.35	required and carried in video ES	not used
Closed captions	required and carried in video ES	optional, maybe carried out-of-band

Table 3. Combination of streaming formats and DRMs that can be used to reach different devices. Orange tick marks indicate possible, but less commonly used choices.

Device Category	Players/Platforms	HLS	DASH	HLS + FairPlay	HLS + Widevine	DASH + Widevine	DASH + PlayReady	MSS + PlayReady
PCs/Browsers	Chrome	✓	✓	🗴	✓	✓	🗴	🗴
	Firefox	✓	✓	🗴	✓	✓	🗴	🗴
	IE, Edge	✓	✓	🗴	🗴	🗴	✓	✓
	Safari	✓	🗴	✓	🗴	🗴	🗴	🗴
Mobiles	Android	✓	✓	🗴	🗴	✓	✓	✓
Mobiles	iOS	✓	🗴	✓	🗴	🗴	🗴	🗴
Set-top boxes	Chrome-cast	✓	✓	🗴	🗴	✓	✓	✓
	Android TV	✓	✓	🗴	🗴	✓	✓	✓
	Roku	✓	✓	🗴	🗴	✓	✓	✓
	Apple TV	✓	🗴	✓	🗴	🗴	🗴	🗴
	Amazon Fire TV	✓	✓	🗴	🗴	✓	✓	✓
Smart TVs	Samsung/Tenzen	✓	✓	🗴	🗴	✓	✓	✓
	LG/ webOS	✓	✓	🗴	✓	🗴	🗴	🗴
	SmartTV Alliance	✓	✓	🗴	🗴	🗴	✓	✓
	Android TV	✓	✓	🗴	🗴	✓	✓	✓
Game Consoles	Xbox One/ 360	✓	🗴	🗴	🗴	🗴	🗴	✓

Table 4. Comparison of proposed architectures.

Characteristic/Components	On Prem Solution	Architecture 1 (Section 4.1)	Architecture 2 (Section 4.2)	Architecture 3 (Section 4.3)
Ingest from remote and field production	On prem	On prem	Cloud	Cloud
Playout	On prem	On prem	Cloud	Cloud
Broadcast distribution transcoding	On prem	On prem	On prem	Cloud
Broadcast multiplexers	On prem	On prem	On prem	Cloud
QAMs (Quadrature Amplitude Modulation) and links to distribution	On prem	On prem	On prem	On prem
OTT transcoding	On prem	Cloud	Cloud	Cloud
OTT packaging, DRM, and SSAI	On prem	Cloud	Cloud	Cloud
OTT origin servers	On prem	Cloud	Cloud	Cloud
Fault-tolerance and redundancy support	On prem	On prem for broadcast chain, cloud for OTT	Cloud	Cloud
% of physical equipment required	100%	50–75%	25–50%	10–20%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Reznik, Y.; Cenzano, J.; Zhang, B. Transitioning Broadcast to Cloud. Appl. Sci. 2021, 11, 503. https://doi.org/10.3390/app11020503

AMA Style

Reznik Y, Cenzano J, Zhang B. Transitioning Broadcast to Cloud. Applied Sciences. 2021; 11(2):503. https://doi.org/10.3390/app11020503

Chicago/Turabian Style

Reznik, Yuriy, Jordi Cenzano, and Bo Zhang. 2021. "Transitioning Broadcast to Cloud" Applied Sciences 11, no. 2: 503. https://doi.org/10.3390/app11020503

APA Style

Reznik, Y., Cenzano, J., & Zhang, B. (2021). Transitioning Broadcast to Cloud. Applied Sciences, 11(2), 503. https://doi.org/10.3390/app11020503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transitioning Broadcast to Cloud

Abstract

1. Introduction

2. Processing Chains in Broadcast and Cloud-Based Online Video Systems

2.1. Main Functions and Distribution Flows

2.2. Contribution and Ingest

2.3. Video Formats and Elementary Streams

2.4. Distribution Formats

2.5. Ad Processing

2.6. Delay, Random Access, Fault Tolerance, and Signal Discontinuities

3. Technologies Needed to Support Convergence

3.1. Cloud Contribution Links and Protocols

3.2. Signal Processing

3.3. Broadcast-Compliant Encoding

3.4. Cloud Playout

4. Transitioning Broadcast to Cloud

4.1. Cloud-Based OTT Systems

4.2. Cloud-Based Ingest, Playout, and OTT Delivery System

4.3. Cloud-Based Broadcast and OTT Delivery System

4.4. Comparison of the Proposed Systems

5. Scale and Performance of Cloud-Based Media Delivery Systems

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI