Image and Video Processing and Retrieval Based on Machine Learning and Deep Learning

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 August 2024) | Viewed by 2602

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Oslo Metropolitan University, N-0130 Oslo, Norway
Interests: AI and machine learning; data science; color and spectral imaging; image processing and analysis; assistive technologies; cloud computing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Information Security and Communication Technology, Norwegian University of Science and Technology, 2815 Gjøvik, Norway
Interests: AI and machine learning; image processing and analysis; information security
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Image and video processing and retrieval using machine learning and deep learning constitutes a rapidly growing field of research with numerous applications in areas such as multimedia search, video surveillance, medical imaging, content-based retrieval, and computer vision.

Deep learning methods, including convolutional neural networks (CNNs), autoencoders, transformers, and generative adversarial networks (GANs) have proven to be effective in image and video processing and retrieval tasks. These techniques can recognize, classify, and generate visual content with high levels of accuracy, leading to the development of powerful algorithms.

Machine learning and deep learning techniques are useful in image and video processing applications such as image and video enhancement, restoration, segmentation, object detection and recognition, image and video classification, and content-based image and video retrieval. They have significant potential for use to improve and automate many aspects of image and video processing, making these important research areas with numerous practical applications.

Image and video retrieval involves the automatic searching and categorization of large collections of visual media. Machine learning and deep learning algorithms have made significant strides in this area, enabling the efficient and effective content-based retrieval of images and videos. As research continues in this area, the development of new algorithms and techniques is expected to result in more accurate, efficient, and reliable processing and retrieval systems for visual media, revolutionizing the way we interact with and analyze visual content.

This Special Issue of Electronics, entitled “Image and Video Processing and Retrieval Based on Machine Learning and Deep Learning”, aims to present recent research work on image processing, video processing, as well on retrieval applications. We invite scholars to submit original research papers with novel findings as well as review articles describing the current state of the art and future perspectives in this field.

Dr. Raju Shrestha
Prof. Dr. Sule Yildirim Yayilgan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • imaging
  • image processing
  • video processing
  • retrieval
  • machine learning
  • deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 4103 KiB  
Article
Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions
by Jelena Šuljug and Snježana Rimac-Drlje
Electronics 2024, 13(20), 4049; https://doi.org/10.3390/electronics13204049 - 15 Oct 2024
Viewed by 967
Abstract
The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively [...] Read more.
The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively select the bitrate according to the transmission conditions. For this purpose, multiple presentations of the same video content are generated on the video server, which contains video sequences encoded at different bitrates with resolution adjustment to achieve the best Quality of Experience (QoE). This set of bitrate–resolution pairs is called a bitrate ladder. In addition to the traditional one-size-fits-all scheme for the bitrate ladder, context-aware solutions have recently been proposed that enable optimum bitrate–resolution pairs for video sequences of different complexity. However, these solutions use only spatial resolution for optimization, while the selection of the optimal combination of spatial and temporal resolution for a given bitrate has not been sufficiently investigated. This paper proposes bit-ladder optimization considering spatiotemporal features of video sequences and usage of optimal spatial and temporal resolution related to video content complexity. Optimization along two dimensions of resolution significantly increases the complexity of the problem and the approach of intensive encoding for all spatial and temporal resolutions in a wide range of bitrates, for each video sequence, is not feasible in real time. In order to reduce the level of complexity, we propose a data augmentation using a neural network (NN)-based model. To train the NN model, we used seven video sequences of different content complexity, encoded with the HEVC encoder at five different spatial resolutions (SR) up to 4K. Also, all video sequences were encoded using four frame rates up to 120 fps, presenting different temporal resolutions (TR). The Structural Similarity Index Measure (SSIM) is used as an objective video quality metric. After data augmentation, we propose NN models that estimate optimal TR and bitrate values as switching points to a higher SR. These results can be further used as input parameters for the bitrate ladder construction for video sequences of a certain complexity. Full article
Show Figures

Figure 1

19 pages, 1012 KiB  
Article
Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm
by Wenjun Song, Congxian Li and Qiuwen Zhang
Electronics 2024, 13(17), 3465; https://doi.org/10.3390/electronics13173465 - 31 Aug 2024
Viewed by 965
Abstract
H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to [...] Read more.
H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to H.265, accommodating a greater variety of texture patterns. However, the changes in the related structures have also led to a significant increase in encoding complexity. To address the issue of excessive computational complexity, this paper proposes a targeted rapid Coding Units segmenting approach combined with decision-making for an intra-frame modes algorithm. In the first phase of the algorithm, we extract different features for CU blocks of various sizes and input them into the decision tree model’s classifier for classification processing, determining the CU partitioning mode to prematurely terminate the partitioning, thereby reducing the encoding complexity to some extent. In the second phase of the algorithm, we put forward an intra-frame mode decision strategy grounded in gradient descent techniques with a bidirectional search mode. This maximizes the approach to the global optimum, thereby obtaining the optimal intra-frame mode and further reducing the encoding complexity. Experimentation has demonstrated that the algorithm achieves a 54.53% reduction in encoding time. In comparison, the BD-BR (Bitrate-Distortion Rate) only increases by 1.38%, striking an optimal balance between the fidelity of video and the efficacy of the encoding process. Full article
Show Figures

Figure 1

Back to TopTop