1. Introduction
Compressed video has recently received a boost from around a 50% reduction in bandwidth requirements arising from the High Efficiency Video Coding (HEVC) codec standard [
1]. Unfortunately, either owing to a desire to reduce codec complexity or owing to a preference for commercial high-definition video-on-demand applications, low-latency video streaming is no longer strongly supported in the codec. As a result, applications such as telemedicine and video conferencing have a limited range of built-in error-resilience tools available [
2]. Further, the Dynamic Adaptive Streaming over HTTP (DASH), which is supported by HEVC, owing to the underlying Transmission Control Protocol (TCP) used by DASH introduces packet re-sending delay whenever a packet is lost to errors or congestion. Video streaming has an important role in telemedicine [
3] both in longer-term monitoring and in emergency responses, where the need for low-latency communication is probably strongest. For home medical advice it would be helpful if face-to-face consultation with a remote clinician were possible. This assumes that HEVC encoding delays can be addressed, as they are currently many times above real-time [
4].
For interactive video streaming with an associated speech channel, delay is harmful to synchronization at both ends of the communication link. Digital Subscriber Line (DSL) is the dominant broadband access network for residential users with 364.1 million links in 2012 [
5]. However, impulse noise is a potent source of DSL transmission errors, resulting in fixed-length error bursts, causing multiple packet losses, the number of which depends on the transmission rate. Sources of error bursts vary from street lights though faulty set-top boxes and even from flashing Christmas tree lights [
6]. As a way of responding to packet error bursts, end-to-end Automatic Repeat ReQuest (ARQ) packets introduce latency, particularly if used from end-to-end, leaving Forward Error Correction (FEC) as the main way of reducing errors in the absence of effective error resilience tools.
This paper considers an application-layer (AL) FEC solution to the problem of video packet loss on DSLs. If packets are lost, owing to video-coding dependencies video streams may be disrupted for up to 500 ms, the duration of a typical Group of Pictures (GoP) [
7]. Consequently, PHYsical-layer FEC protection in the case of video is often supplemented by AL-FEC [
8], which has been introduced into the main wireless standards. Typical examples discussed in
Section 2 of FEC codes [
9] employed in packetized video streams are: eXclusive OR (XOR)-based codes, simple or interleaved; Low-Density Parity-Check (LDPC) codes; and Reed-Solomon (RS) codes.
However, FEC or channel codes due to their computational complexity may also introduce latency. Coding latency can be composed of: the need to accumulate sufficient data to successfully repair packets, a problem that may arise with the rateless codes [
10], including Raptor codes [
8], leading to large input buffers; and the delay arising from the computational complexity of the coding and decoding operations, a problem with Reed-Solomon (RS) codes [
11] as the block size increases. When rateless codes are employed adaptively, another source of latency may arise, owing to a need to request the sender to stop sending additional repair packets. Though there are now open-source versions of Raptor code, Raptor10 and RaptorQ [
12], for research purposes, care should be taken not to infringe any of the patents associated with Raptor coding. Thus, to computational complexity, in some cases, one can add legal complexity. One aim of our research in this paper was to find an effective method of channel coding which was not constrained by legal restrictions. Therefore, legally-constrained methods of channel coding, however efficient they may be in repairing erasures are, in this paper, not directly included in the performance comparison.
Interleaving of packets in order to reduce the impact of error bursts also has a latency implication, again owing to the need to accumulate sufficient data before interleaving can take place. In addition, whenever the latency budget is large, the video-sending rate becomes “bursty”, which can lead to congestion and ultimately to packets being dropped from buffers. Therefore, some of the results presented in this paper, examine the trade-off between error-repair capability and the latency arising from a choice of FEC method. As it is, because the packet loss pattern, rather than the number of packets lost, actually affects video distortion, other results examine the video delivered after using our recommended low-latency channel coder.
LDPC codes present rapid decoding owing to the sparseness of their parity check matrix (refer to
Section 2). However, in general the generator matrix created from the parity check matrix is not necessarily sparse and, hence, encoding time can be very high. To reduce encoding latency, Low-Density Generator Matrix (LDGM) codes [
13], with, a sparse generator matrix, are attractive candidates. They can be used with relatively smaller block sizes, unlike LDPC codes, which are implemented with block sizes of about 1000 for best recovery performance. Rizzi [
14] has demonstrated that the encoding latency of systematic erasure codes is linearly dependent on the block size, for sufficiently large packets. Therefore, for block sizes of 200, as herein, one can expect a one fifth reduction in encoding time. The decoding speed still linearly depends on the block size but also linearly depends on the number of packets lost, that is, there is a channel dependent element.
However, unmodified, LDGM codes have a non-zero error probability that is independent of the code block length,
i.e., high error floors (see
Section 2.2 for potential solutions to this problem). On the other hand, unlike RS codes, decoding can be iteratively refined through a belief-propagation algorithm (also known as a sum-product or iterative probabilistic decoding algorithm), rather than having to wait for all the data before decoding can begin. Thus, start-up latency is reduced, which is important for real-time streaming. Because in both [
15] and [
16], LDPC was selected as most suitable for Real-time Transport Protocol (RTP) video streaming compared to other channel codes, the possibilities for its reduced cousin, LDGM, are promising. Notice that in order to avoid legal complexity, this paper also confines itself to regular LDPC and LDGM codes (ones with a constant number of equations defining each code symbol). For an irregular LDPC code with apparently an associated patent refer to [
17]. In this paper, we now consider the relevance of LDGM codes for video communication with shorter block lengths, whereas work in [
13] considered their relevance for general-purpose data communication with larger block lengths.
The remainder of this paper is organized as follows.
Section 2 reviews the advantages and disadvantages of various channels codes suitable for protection of real-time video streams.
Section 3 describes the methodology employed in evaluating the application of LDGM FEC.
Section 4 is a comparison across those codes in terms of the trade-offs to be made across the dimensions of latency, error recovery, severity of error conditions, and resulting video distortion. In
Section 5 we review related research in this area. Finally,
Section 6 makes some concluding remarks before discussing future developments of this research.
2. Channel Codes
This Section is a review of some available channel codes from the points of view of latency, computational complexity, error correction capability, and other implementation factors.
2.1. LDPC Codes
LDPC codes are linear block codes [
18] characterized by parameters
k, and
n, which correspond to the number of bits (assuming for the moment a bit-sized symbol) of an information and code vector respectively. Therefore, the number of redundant bits is
. The rate of such a code is represented by
,
i.e.,
r is inversely proportional to the number of redundant symbols added.
In order to output a code vector
c from an information vector
u a generator matrix
G is required:
Notice that a code vector includes both input data symbols and additional parity symbols. In turn,
G is created from a parity-check matrix
H, which is involved in decoding when a transmission error has occurred. As the code’s name implies, matrix
H is a sparse matrix with a low density of “1s” and all other entries set to zero. In general, the entries of
H are filled randomly.
H provides
parity check equations that create constraints between data symbols and parity symbols. These constraints indicate which data symbols are involved in Exclusive OR (XOR) operations to form the parity symbols. Notice that, in LDPC, data symbols contribute indirectly to the creation of the parity symbols because the constraints can be combined in an XOR operation to form a parity symbol. To create
G from
H,
H is first re-arranged in an appropriate form such that the output will be systematic. (A systematic code is one in which the information symbols are separated from the parity symbols, allowing the information symbols to be extracted without decoding if no errors are detected.) Thus,
H is represented as:
where
P is a matrix of dimensions
and
is the identity matrix of dimensions
.
G is then created as:
Though the “1s” in the
H matrix are randomly generated, the number of “1s” in each row and each column is principally kept constant to reduce legal complexity, as mentioned in
Section 1. Such an LDPC (and its lower-complexity cousin LDGM)
H matrix is called regular under that constraint. For example, in a regular
H matrix with three “1s” in each column, each code symbol is involved in three constraint equations. Each row represents a constraint equation. Thus, with four “1s” in a row, three data symbols are combined with a parity symbol. The constraint is then that all four symbols when XORed together will give a value of zero, hence the name parity check for matrix
H.
Thus, two parameters can be defined:
, the number of “1s” in each column; and
, the number of “1s” in each row. For
H also to be sparse requires
and
. Then for a regular LDPC matrix:
Although LDPC codes are not a Maximum-Distance Separable (MDS) code, implying that they do not offer the optimal recovery capability for a block code, they have a lower decoding computational burden compared to RS codes, owing to: the use of XOR operations to generate the redundancy when encoding; and the low density of the parity-check matrix, which results in a low number of decoding operations.
To put matrix H in the form given by equation (2) the Gauss-Jordan elimination algorithm can be used. That algorithm, in general, has complexity of order . Depending on circumstances, the creation of matrix G from H can be performed offline. However, resulting matrix G in general is not sparse, owing to the Gauss-Jordan elimination process applied, resulting in an encoding complexity of order , with n around 1000 in large block coding.
It is possible to rearrange matrix
H into an approximately lower-triangle form [
19] so that it retains its sparseness, even after Gauss-Jordan elimination, because only some of the sub-matrices are affected. The order of encoding complexity then becomes
, where
g is a small constant or scales as a small fraction of
n [
20]. The algorithm’s software complexity does increase as a result, whatever the theoretical computational complexity. Therefore, LDGM is an alternative way to create a sparse generator matrix, as now discussed.
2.2. LDGM Codes
A simplified version of LDPC codes is represented by LDGM codes, for which the parity-check matrix
H corresponds to the generator matrix
G [
13],
i.e.,
H is employed directly in encoding. In the LDGM approach, the parity-check matrix
H has a size of
, compared to the LDPC case in
Section 2.1 of
. An identity matrix of size
augments
H in order to associate each parity symbol with a set of data symbols identified by
H. Thus, augmented
H,
, has the form:
It is interesting to compare the LDGM codes with the Staircase codes of [
20] for fast encoding. Though in [
20] these are called LPDC codes, they are in fact also a type of LDGM code, as the first
k data symbols are combined through
H. However, rather than augment
H with an identity matrix,
H is augmented with a diagonal matrix,
, with just two “1s” in each row (except for just one “1” in the first row) and size
. The form of
, which is of a descending staircase of “1s” from the viewer’s left to right, gives rise to the name “Staircase”. The form of the Staircase code matrix
Hs is:
The Staircase code can be used in an iterative fashion to create the parity symbols. The author of [
20] points out that this Staircase code has linear encoding complexity if account is taken of the sparseness of
H. Thus, it is that LDGM codes also have linear encoding complexity. A regular LDGM code, as used in this paper, is constrained by:
When encoding with H, only the first k symbols contribute to encoding, compared to the LDPC case, for which n symbols have to be processed. The disadvantage of this arrangement is that parity symbols are only protected by one subset of data symbols, implying that the error-correction capability is reduced. LDGM codes have several potential advantages despite the reduction in recovery performance: as the encoding complexity is lower than LDPC codes and decoding is similarly of low complexity, i.e., the decoding algorithms are the same, they are suited to encoding/decoding on a variety of battery-powered mobile devices; and low values of k imply that for real-time video applications such as telemedicine or video conferencing, the latency budget is considerably reduced.
There is a downside: LDGM codes have high error floors that cannot necessarily be reduced by increasing the block size. (When, after application of FEC, the bit error rate ceases to reduce with decreased Signal-to-Noise Ratio (SNR), an error floor is said to exist.) However, at least for a binary symmetric channel (BSC) it is has been analytically demonstrated [
21] that concatenating two LDGM codes (applying one LDGM code after another) overcomes the onset of an error floor, while retaining LDGM’s computational complexity, provided a belief-propagation (message-passing) decoding algorithm is employed. Later work [
22] confirmed the findings of [
21] for a Rayleigh channel and provides analysis on how best to configure LDGM codes.
In this paper, we simulate an erasure channel, which is not necessarily open to analysis in the way a BSC is but nevertheless occurs in practice after PHY-layer error recovery fails to recover a packet. An erasure channel is distinguished by the property that the positions of corrupted symbols are known in advance, sometimes because an upper-layer protocol records the packet sequence numbers. Though a concatenated LDGM code was not used in the experiments of
Section 4, video quality was still found to be good. The LDGM improvement in
Section 3.2 is not intended as a remedy to high error floors, though it does improve the coding efficiency. However, as with Raptor codes [
12], which already use concatenated codes, the effect of introducing concatenation is expected to be simply a linear increase in coding complexity.
2.3. Pro-MPEG COP #3
Professional-MPEG code of practice #3 (Pro-MPEG COP #3) [
23] is an industry standard for video transmission protection that is widely deployed. Incoming packets are arranged in a matrix on a row-by-row basis, assuming packet-sized symbols. Redundant packets are subsequently appended to each column of the matrix and optionally to each row of packets. The packets are transmitted column-by-column,
i.e., orthogonally to the way they were read into the matrix. The redundant packets are created by a byte-wise XOR operation across the packets of each column/row. This simple interleaving scheme has the advantage of convenient hardware implementations. The standard restricts the number of columns and rows to a maximum of 20.
In this paper, the one-dimensional version of Pro-MPEG COP #3 is tested in which redundant packets are only created for the columns. One-dimensional Pro-MPEG COP #3 was also selected in [
10] for the reason that it is more widely deployed. The number of rows was set to four (the minimum) and the number of packets in a column to 20. This is the same configuration as employed in [
24] as part of an unequal loss protection (ULP) scheme, according to the video frame type (I- B- and P-type) priority. In the current paper, for ease of comparison, ULP is not used.
2.4. RS Codes
RS codes have the MDS property,
i.e., any
k packets can be received to recover the k information packets, whereas around
1.05% of packets are needed [
16] for full recovery in LDPC. RS codes are linear, cyclic codes, formed by sequences of
m-bits symbols, each of which symbols belong to a finite Galois Field,
i.e., GF(2
), where m takes values greater than two.
n is set to the value
. If
m is greater than eight the RS computational complexity can be prohibitive. Specifically, the total complexity is O(
), even when using a fast frequency-domain algorithm [
25].
RS FEC in the common intra-packet approach works by grouping k packets at a time. From each of the k packets, the first m-bits are extracted to form k m-bit symbols. These k symbols are employed to generate redundant symbols by means of the RS algorithm. The redundant symbols are then packed as the first m-bit symbols of parity packets. The intra-packet algorithm continues by extracting the next set of m-bit symbols and forming m-bit redundant symbols and packing these as the next set of symbols in the parity packets.
In [
26], the interleaving factor was increased by forming each
m-bit symbol by extracting one bit from each of
m packets in turn. This alternative inter-packet approach improves the error-recovery performance in “bursty” error conditions, as the loss of any one symbol affects only one bit per packet. However, whereas the latency budget of the intra-packet scheme is the time for
k packets to arrive, in the inter-packet budget it is the time for
packets to arrive at the sender.
5. Related Work
The impact of impulse noise on an ADSL2+ link is documented in [
42]. Evidently there is severe “blockiness”, where runs of macroblocks (MBs) have been lost and error concealment at the codec decoder has failed to replace the MBs in an unobtrusive manner.
In [
16], a simple 2D FEC code (with similarities to Pro-MPEG COP #3), RS codes, and LDPC were evaluated by embedding them in Linux RTP protocol stacks that included RTP packetization with IP/UDP headers. Packet erasures were uniformly distributed. The coding rate was high at 2/3. The authors reported that above 30% PERs, no codes could repair all packets but LDPC was only slightly worse than RS in error recovery. The simple 2D FEC code was noticeably worse than both LDPC and RS. Comparing latencies, RS codes introduced maximum average delay of 544 ms, followed by LDPC with 1000 block lengths at 402 ms and the simple 2D FEC codes at around 50 ms. Lower block lengths for LDPC reduced latency but decreased error recovery. These results confirm that a compromise code, such as LDGM (or low block length LDPC), is a good option for interactive video. Computational overhead for the simple 2D FEC code and LDPC remained below 8% whatever the PER while as the error rate increased RS code computational overhead climbed steeply.
As part of the OpenFEC project, [
43] examined the same codes as in [
16] but with measured error traces from ADSL links rather than the random drops of [
16]. These results confirmed those of [
16] for random losses but showed that even at loss rates below 10% and with a code rate of 2/3, when error bursts occurred not all packets could be recovered. As in [
16] an LDPC code with a block length as low as 170 was found to be competitive with RS codes in error recovery terms. Because not all packets could be recovered even at low loss rates, the ability to retransmit was recommended, which is also the recommendation of [
29].
In [
32], a number of interesting points of comparison are made between some of the channel codes mentioned herein. The main limitation of the LPDC family of codes in comparison to rateless non-patented online codes [
44] is that they are not rateless in the sense that the maximum number of parity packets must be defined in advance. LDPC is further disadvantaged, compared to rateless and LDGM codes in that it has to store all the source parity packets during the encoding process. The family of codes performs most efficiently when the FEC overhead is low to medium. Staircase LDGM codes were preferred by the authors’ of [
32] but they employed large block sizes (as they had in [
13]) and were unaware of possible improvements such as that of
Section 3.2.
The authors of [
39] considered packet interleaving as an alternative response to bursty losses. As in Pro-MPEG COP #3 (refer to
Section 2.3) the intention was to spread packet error bursts so that they no longer affect consecutive packets of the source packet stream. Unlike Pro-MPEG COP #3, channel coding was not incorporated into the interleaving scheme. Instead, packet interleaving was introduced at the IP-packet level irrespective of content. However, in [
45], the authors demonstrate that if interleaving is also introduced into rate-controlled streams then the possibility exists of an adverse interaction with the congestion control algorithm. For an interleaving block of 48 UDP/IP packets, the impact on end-to-end latency was estimated to be an acceptable 65 ms. Smaller interleaving block sizes could be employed, depending on error burst conditions. The scheme was most appropriate to the small burst-length conditions present in satellite links. In [
46], packet interleaving took place before video compression. Though error resilience was improved, the loss of temporal redundancy, owing to interleaving before the codec, resulted in a degradation of coding efficiency. Interleaving of base-layer and enhancement-layer packets in scalable coding does not suffer from the same weakness and in [
47], along with other techniques, resulted in greater resistance to error bursts. The work in [
24] dynamically introduced Pro-MPEG COP #3 into an RTP video packet stream. Depending on bitrate constraints and the channel conditions, selected video frames were protected. In particular, I-frames were protected because of their impact on later frames.
Apart from interleaving, network coding has also been deployed [
48] to counter error bursts occurring in wireless multi-hop networks. Alternatively, if it is possible to adaptively route packets based on measured link conditions, as it is with the Cognitive Packet Network [
49], then multimedia streams can gain in quality and become more resilient to error bursts. Again, in content-aware networks, transmitting redundant packets can counter packet losses [
50].
6. Conclusions
This study has shown study that industry-standard 2D parity codes underperform in terms of combined latency and error recovery. RS codes are attractive in terms of error recovery but are not so attractive for low-latency applications of video streaming. Whenever there is interactivity RS codes at the application layer potentially result in a lack of synchronization between two communicating parties. This problem arises owing to their computational complexity and the need to apply them in an interleaving mode, which increases the latency budget. As an alternative, this paper proposes that LDGM codes with small block lengths represent a natural candidate for low-latency interactive video streaming and, as results quoted in this paper indicate, can lead to up to a 4 dB reduction in video distortion for active sports sequences. The coding overhead is just a 9% increase in datarate. Furthermore, by rearranging columns of the parity-check matrix with the least error-recovery properties it is possible to improve the average response to error bursts. Given that rateless codes may have patents applied, LDGM codes offer a further commercial advantage, because licensing fees no longer apply. The latter advantage makes LDGM codes suitable for applications such as telemedicine in which video streaming does not generate a compensating revenue stream.
Future work will check the performance of these codes against video content other than the sports sequences so far investigated. The HEVC codec is aimed at high definition (HD) video and, hence, the HD video should also be investigated, confirming the real-time performance. In general, as the number of packets in HD video communication is much larger than for standard definition (SD) video the impact of any one packet loss in terms of error propagation is expected [
51] to be less than that for SD video. Consequently, a pure FEC technique such as in this paper may be even more effective for HD video. In general, the real-time transmission and FEC coding time will scale linearly according to the number of packets in an HD video frame relative to an SD frame. Thus in [
44], there were 31 and 68 rows of macroblocks for SD and HD respectively, with coding at rates of 4 and 16 Mbps respectively using H.264/AVC. HEVC [
1] has approximately 50% more coding efficiency than H.264/AVC, which clearly saves bandwidth. However, paradoxically, increased coding efficiency, reduces the error protection arising from LDGM FEC (at the same rate as applied to H.264/AVC streams) because greater coding efficiency implies greater dependencies between the bits in the video bitstream.