A Novel Multimedia Player for International Standard—JPEG Snack

The advancement in mobile communication and technologies has led to the usage of short-form digital content increasing daily. This short-form content is mainly based on images that urged the joint photographic experts’ group (JPEG) to introduce a novel international standard, JPEG Snack (International Organization for Standardization (ISO)/ International Electrotechnical Commission (IEC) IS, 19566-8). In JPEG Snack, the multimedia content is embedded into a main background JPEG file, and the resulting JPEG Snack file is saved and transmitted as a .jpg file. If someone does not have a JPEG Snack Player, their device decoder will treat it as a JPEG file and display a background image only. As the standard has been proposed recently, the JPEG Snack Player is needed. In this article, we present a methodology to develop JPEG Snack Player. JPEG Snack Player uses a JPEG Snack decoder and renders media objects on the background JPEG file according to the instructions in the JPEG Snack file. We also present some results and computational complexity metrics for the JPEG Snack Player.


Introduction
This article proposes a novel multimedia player for JPEG Snack. We also present several experimental results to show the results of the JPEG Snack Player. The significant contributions of the article are listed: • Description of JPEG Snack encoded file; • Description of JPEG Snack decoder; • Description of JPEG Snack System decoder; • Development of novel multimedia player for JPEG Snack files as there is no multimedia player for the JPEG Snack standard, which is in the international standard under publication stage; • Analysis of the complexity of the player.
The rest of the article is organized as follows: Section 2 summarizes the related work. Similarly, Section 3 briefly explains the JPEG Snack encoded file, followed by Section 4, which comprehensively discusses the JPEG Snack Player. In Section 5, the experimental results are presented. Section 6 presents the comparison of the features of the JPEG Snack Player with other media players. Section 7 presents the limitations and future directions. Finally, Section 8 concludes the work.

JPEG Snack Encoded File
According to the ISO/IEC International Standard (IS), 19566-8 [14], a JPEG Snack file follows the ISO/1EC 10918-1 file format. In the JPEG Snack file, the application 11 (APP11) marker for the JPEG universal metadata box format (JUMBF) box [31] of the JPEG Snack representation and metadata are placed after the start of the image (SOI) marker. At the same time, the APP11 markers for embedding the media data can be placed anywhere before the start of the scan (SOS) marker. Figure 1 shows the file organization of the JPEG Snack file.

JUMBF Box for JPEG Snack
A JUMBF box for JPEG Snack consists of one JPEG Snack description box (JSDB), one instruction set box (INST), and multiple object metadata boxes (OBMBs).

JSDB
A JSDB contains the number of objects and the start time required for the JPEG Snack representation.

INST
An INST contains the information and instructions about the representation of the JPEG Snack composition.

OBMB
Each OBMB contains the media type associated with each media object embedded in the JPEG Snack file. These media types are listed in Table 1. These boxes are explained in detail in [14].

Role of Sensors
A JPEG Snack file needs data from visual sensors, such as the camera, and sound sensors, such as the microphone. As explained above, the JPEG Snack file contains embedded multimedia data, so the inputs to the JPEG Snack encoder can be portable network graphics (PNG) or JPEG-1 images taken from the camera, videos recorded with the camera, or audio recorded with a microphone. The sensors used for the images and videos are compact cameras [32], 360 • cameras [33], digital single-lens reflex (DSLR), and adventure cameras. For audio, microphone sensors are used. The role of sensors in the JPEG Snack file is illustrated in Figure 2.

Methodology
The backbone of the JPEG Snack Player is the JPEG Snack decoder. The JPEG Snack decoder decodes the JPEG Snack file, and the decoded information is rendered. The JPEG Snack Player displays JPEG Snack representations based on the layer and position information obtained from the JPEG Snack decoder. The high-level flow diagram of the JPEG Snack decoder is shown in Figure 3.

JPEG Snack Decoder
Three things are required to decode the JPEG Snack: (a) the background default JPEG image, (b) the playback timeline, and (c) the layer and position of the snack. These components are shown in Figure 4. JPEG Snack decoders decode default background images and translate instructions about displaying embedded objects on the default images. The default image is a JPEG-1 background image with JPEG Snack content embedded using APP11 markers. An embedded object's timeline tells when it will appear on the background image and for how long. The layer and position of the embedded object specify on what portion of the default image it will be displayed and what its size will be.

JPEG Snack System Decoder
A JUMBF parser delivers the JPEG Snack stream to the system decoder. JPEG Snack streams contain media and metadata about object structures and composition descriptions. The appropriate media decoders are invoked, and compositor-object descriptions control playback on the local device. The JPEG Snack system decoder is shown in Figure 5. JPEG Snack's system decoder takes JPEG codestream data. There are two types of embedded JUMBF boxes: JPEG Snack content type JUMBF boxes and embedded file content type JUMBF boxes. Metadata are in the JPEG Snack content type JUMBF box, whereas media data are in the embedded file type JUMBF box. A JUMBF parser extracts metadata and passes them through an object composer. From the JUMBF parser output, the object composer extracts media format, time, and position. The media decoder takes inputs such as media format, time, and media data and outputs media files. Media decoders can decode images or other media formats. Media output and z-order from the object composer are sent to the compositor, which creates snack representations and displays them according to playback timelines.

JPEG Snack Player Algorithm
JPEG Snack Player follows Algorithm 1. Initially, the JPEG Snack file is decoded, and after decoding, the background JPEG image is picked from the media files and displayed on the player's screen. After showing the background image, the embedded media files are displayed according to the layer and position information related to each media file. The embedded media files are audio, videos, captions, and a group of images. The media type tells us about the embedded media; if it is audio, then the audio player is used to play the audio concurrently with the background image and the time specified in the JPEG Snack file. If the embedded file media type is an image, it is displayed in the background image according to the position specified in the JPEG Snack file. Similarly, if the embedded file media type is video, then a video player is used to play video in the specific position of the background JPEG image. Captions are also overlaid on the background image according to the information in the JPEG Snack file. Figure 6 shows the visual representation of the steps involved in the JPEG Snack Player Algorithm.  Figure 6. Visual representation of the steps involved in the JPEG Snack Player Algorithm.

Experimental Results
The JPEG Snack Player enables users to play JPEG Snack files in three different modes: When the select files button is pressed, it allows the user to pick the JPEG Snack file with two objects embedded in it with the following values of the JSDB, INST, and OBMBs described in Appendix A in Tables A1-A4. When the file is selected, the background image is displayed on the plot area of the player, as shown in Figure 7. As in this JPEG Snack file, two objects are embedded, so these two objects are displayed according to the instructions. The first object is displayed after two seconds, as the start time is 2000 ms, and is shown in Figure 8a. Object 1 persists, and the second object is displayed after three seconds, as shown in Figure 8b. JPEG Snack files can have embedded images, audio, videos, a group of images, and captions. Therefore, JPEG Snack Player can play all the multimedia mentioned above on the background JPEG image. Figure 9 shows the JPEG Snack player playing a JPEG Snack file in which a group of photos is embedded. The values of JSDB, INST, and OBMB are presented in Appendix B in Tables A5-A7, respectively. In this example, when the JPEG Snack file is selected, the background JPEG image is displayed, as shown in Figure 9a. When the JPEG Snack file is played, after two seconds, the first image from the sequence of images is displayed as shown in Figure 9b. Similarly, after three seconds, the second image from the series of images is shown on the JPEG Snack Player, as shown in Figure 9c. After four seconds, all the pictures disappear. Similarly, Figure 10 shows the JPEG Snack Player playing a JPEG Snack file with a caption and JPEG image embedded. In this example, a JPEG-1 image and caption are embedded in the background JPEG file. After two seconds, the first object, i.e., JPEG-1 image, appears on the background image. After three seconds, the embedded caption appears on the image. JPEG Snack Player extracts the media type of the embedded multimedia files and plays accordingly. The values of JSDB, INST, OBMB for Object 1 and OBMB for Object 2 are presented in Appendix C in Tables A8-A11, respectively. Likewise, Figure 11 shows the JPEG Snack Player playing a JPEG Snack file with an mp4 video and JPEG image embedded. In this example, a JPEG-1 image and mp4 video are embedded in the background JPEG file. After two seconds, the first object, i.e., JPEG-1 image, appears on the background image. After three seconds, the embedded video appears on the image. The embedded video is played for a short duration of time and then it disappears as the value of persistence is zero. The values of JSDB, INST, OBMB for Object 1 and OBMB for Object 2 are presented in Appendix D in Tables A11-A14, respectively. We also evaluated the JPEG Snack Player by calculating the performance parameters. JPEG Snack Player Application takes 9.8 MB of disk space during execution. The total application installer size is 2.6 MB.
We also evaluated the decoding time of the JPEG Snack decoder and the decoding time of the JPEG Snack Player. The following Table 2 and Figure 12 compare the decoding time in seconds. The decoding time is evaluated on a laptop having the following specifications: core-i5, 7th generation, with each core being of 2.60 GHz. The system also possesses 8 GB of random access memory (RAM). The system is also equipped with a 512 GB solid-state drive (SSD). The laptop is designed by Hewlett-Packard (HP) Computer hardware company, Palo Alto, California, United States.

Limitations and Future Directions
Currently, the software is only available for use on personal computers in its current version. There is the possibility of extending it to a smartphone app by importing a JPEG Snack file decoder as a library in the Android application, which can be used to process JPEG Snack files. To enjoy JPEG Snack files online, the software can be extended to the web-based version to make it possible to enjoy the files online. Furthermore, it is also possible to include the JPEG Snack editor in the JPEG Snack Player so that users would be able to update and customize the embedded content of the JPEG Snack files within the player.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. First Example JUMBF Boxes
Tables A1-A4 present the values of JSDB, INST, and OBMB for the first object and OBMB for the second object, respectively, for the third example. This means that the second object will not persist Life 2 This means that Object 1 and Object 2 will be displayed simultaneously for 2 s Next-use 0 It means that this instruction will not be reused  Tables A5-A7 present the values of JSDB, INST, and OBMB for the first object, respectively, for the second example. This instruction is executed with the next instruction Next-use 0 It means that this instruction will not be reused

Appendix C. Third Example JUMBF Boxes
Tables A8-A11 present the values of JSDB, INST, and OBMB for the first object and OBMB for the second object, respectively, for the third example.  This means that Object 1 and Object 2 will be displayed simultaneously for 2 s Next-use 0 It means that this instruction will not be reused Table A10. Values of the third OBMB parameters for the first example.

Parameter Value Description
Toggle 0000 0000 This shows that no optional field is used ID 1 Identifier of the box Media type 'image/jpg' The embedded media is a JPEG-1 image Location self#jumbf = Object 1 The image is embedded in the same file Table A11. Values of the second OBMB parameters for the third example.

Parameter Value Description
Toggle 0000 0110 This shows that style and opacity are present ID 2 Identifier of the box Media type 'text/utf-8' The embedded media is a caption Style css_code The style of the caption is embedded in the form of a style file Opacity 0.6 The opacity of value is 0.6, which means that the transparency will be 60% Location self#jumbf = Object 2 The caption is embedded in the same file

Appendix D. Fourth Example JUMBF Boxes
Tables A12-A15 present the values of JSDB, INST, and OBMB for the first object and OBMB for the second object, respectively, for the fourth example. This means that Object 1 and Object 2 will be displayed simultaneously for 2 s Next-use 0 It means that this instruction will not be reused