applsci-logo

Journal Browser

Journal Browser

AI for Multimedia Information Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (10 June 2024) | Viewed by 5485

Special Issue Editor


E-Mail Website
Guest Editor
Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdansk, Poland
Interests: audio; broadcasting; coding; compression; digitization; mobile technologies; multimedia; positioning; signal processing; speech processing; video; wireless communication
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial Intelligence is applied all around us in our modern-day, digitalized society. From consumer behavior analysis to Internet profiling, it accompanies us anytime and everywhere, regardless of the type of device that we utilize. All online activities, including web browsing, streaming, shopping, or downloading apps, leave traces of our interests and preferences, which can be used to train various tools. Whether you prefer desktop or mobile devices, AI-related technologies can enhance the pictures that you take, sharpen and stabilize recorded videos, introduce effects or delete unwanted elements. These solutions can also denoise your voice calls, finetune prerecorded audio material, and even imitate the timbre of a celebrity. When it comes to multimedia information processing, the sky seems to be the only limit.

However, there are still numerous fields in which AI technologies could help to raise the quality of life of individuals. These could include voice or video assistants for the youngest or elderly, as well as pedestrial or motorized navigation. With the aid of modern tools, we could speed up medical diagnostics, design personalized therapy or employee training courses, etc.

In this Special Issue, we invite the scientific community to publish works focused on AI for multimedia information processing.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not limited to) the following:

  • AI tools and software;
  • Audio and video signal processing;
  • Coding and compression;
  • Content creation and enhancement;
  • Mobile and desktop technologies;
  • Multimedia digitization;
  • Quality of life.

I look forward to receiving your contributions.

Dr. Przemysław Falkowski-Gilski
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • AI tools and software
  • audio signal processing
  • coding and compression
  • content creation
  • content enhancement
  • desktop technologies
  • digitization
  • mobile technologies
  • multimedia
  • quality of life
  • video signal processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 2815 KiB  
Article
Reinforcement Learning with Multi-Policy Movement Strategy for Weakly Supervised Temporal Sentence Grounding
by Shan Jiang, Yuqiu Kong, Lihe Zhang and Baocai Yin
Appl. Sci. 2024, 14(21), 9696; https://doi.org/10.3390/app14219696 - 23 Oct 2024
Viewed by 1086
Abstract
Temporal grounding involves identifying the target moment based on the provided sentence in an untrimmed video. In weakly supervised temporal grounding studies, existing temporal sentence grounding methods face challenges in (1) learning semantic alignment between the candidate window and language query and (2) [...] Read more.
Temporal grounding involves identifying the target moment based on the provided sentence in an untrimmed video. In weakly supervised temporal grounding studies, existing temporal sentence grounding methods face challenges in (1) learning semantic alignment between the candidate window and language query and (2) identifying accurate temporal boundaries during the grounding process. In this work, we propose a reinforcement learning (RL)-based multi-policy movement framework (MMF) for weakly supervised temporal sentence grounding. We imitate the behavior of human beings when grounding specified content in a video, starting from a coarse location and then identifying fine-grained temporal boundaries. The RL-based framework initially sets a series of candidate windows and learns to adjust them step-by-step by maximizing the rewards, indicating the semantic alignment between the current window and the query. To better learn the alignment, we propose a Gaussian-based Dual-Alignment Module (GDAM) which combines the strengths of both scoring-based and reconstruction-based alignment methods, addressing the issues of negative sample bias and language bias. We also employ the multi-policy movement strategy (MMS) which grounds the temporal position in a coarse-to-fine manner. Extensive experiments demonstrate that our proposed method outperforms existing weakly supervised algorithms, achieving state-of-the-art performance on the Charades-STA and ActivityNet Captions datasets. Full article
(This article belongs to the Special Issue AI for Multimedia Information Processing)
Show Figures

Figure 1

19 pages, 14210 KiB  
Article
Video Quality Modelling—Comparison of the Classical and Machine Learning Techniques
by Janusz Klink, Michał Łuczyński and Stefan Brachmański
Appl. Sci. 2024, 14(16), 7029; https://doi.org/10.3390/app14167029 - 10 Aug 2024
Viewed by 1086
Abstract
The classical objective methods of assessing video quality used so far, apart from their advantages, such as low costs, also have disadvantages. The need to eliminate these defects results in the search for better and better solutions. This article proposes a video quality [...] Read more.
The classical objective methods of assessing video quality used so far, apart from their advantages, such as low costs, also have disadvantages. The need to eliminate these defects results in the search for better and better solutions. This article proposes a video quality assessment method based on machine learning using a linear regression model. A set of objective quality assessment metrics was used to train the model. The results obtained show that the prediction of video quality based on a machine learning model gives better results than the objective assessment based on individual metrics. The proposed model showed a strong correlation with the subjective user assessments but also a good fit of the regression function to the empirical data. It is an extension and improvement of the efficiency of the classical methods of objective quality assessment that have been used so far. The solution presented here will allow for a more accurate prediction of the video quality perceived by viewers based on an assessment carried out using a much cheaper, objective method. Full article
(This article belongs to the Special Issue AI for Multimedia Information Processing)
Show Figures

Figure 1

18 pages, 457 KiB  
Article
Throughput Prediction of 5G Network Based on Trace Similarity for Adaptive Video
by Arkadiusz Biernacki
Appl. Sci. 2024, 14(5), 1962; https://doi.org/10.3390/app14051962 - 28 Feb 2024
Cited by 1 | Viewed by 2538
Abstract
Predicting throughput is essential to reduce latency in time-critical services like video streaming, which constitutes a significant portion of mobile network traffic. The video player continuously monitors network throughput during playback and adjusts the video quality according to the network conditions. This means [...] Read more.
Predicting throughput is essential to reduce latency in time-critical services like video streaming, which constitutes a significant portion of mobile network traffic. The video player continuously monitors network throughput during playback and adjusts the video quality according to the network conditions. This means that the quality of the video depends on the player’s ability to predict network throughput accurately, which can be challenging in the unpredictable environment of mobile networks. To improve the prediction accuracy, we grouped the throughput trace into clusters taking into account the similarity of their mean and variance. Once we distinguished the similar trace fragments, we built a separate LSTM predictive model for each cluster. For the experiment, we used traffic captured from 5G networks generated by individual user equipment (UE) in fixed and mobile scenarios. Our results show that the prior grouping of the network traces improved the prediction compared to the global model operating on the whole trace. Full article
(This article belongs to the Special Issue AI for Multimedia Information Processing)
Show Figures

Figure 1

Back to TopTop