Advances and Applications of Deep Learning Methods and Image Processing

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289).

Deadline for manuscript submissions: 16 July 2025 | Viewed by 20204

Special Issue Editors


E-Mail Website
Guest Editor
MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge CB2 1TN, UK
Interests: machine learning; AI

E-Mail Website
Guest Editor
Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK
Interests: machine learning; AI

Special Issue Information

Dear Colleagues,

Deep learning has revolutionized the field of image processing since its emergence. As a result, manual feature extraction techniques have become obsolete. The field of deep learning (DL) has been one of the fastest-growing areas in artificial intelligence and data science, with rapidly emerging applications spanning read-scene understanding, medical image analysis, plant phenotyping, textual data modalities, etc. This allows researchers to analyze a variety of signal and information-processing tasks by automatically identifying features. This Special Issue explores the application of deep learning and artificial neural networks to analyze data and make predictions across a wide range of industries and fields and will present a wide variety of deep learning applications, from solving complex data science problems to developing automatic user controls.

The main objective is to bring together deep learning researchers from different disciplines to discuss new ideas, research questions, recent results, and challenges in this emerging area.

Potential topics include but are not limited to:

  • Deep learning models for medical image analysis (healthcare);
  • Deep learning models for plant phenotyping;
  • Deep learning models for road scene understanding (self-driving);
  • Deep learning models for fighting deepfakes;
  • Deep learning models for pixel restoration;
  • Deep learning models for video gaming;
  • Deep learning models for online marketing support;
  • Deep learning models for user behavior analysis.

In this Special Issue, original research articles and reviews are welcome.

We look forward to receiving your contributions.

Dr. Robail Yasrab
Dr. Md Mostafa Kamal Sarker
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • image analysis
  • scene understanding

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

19 pages, 5278 KiB  
Article
A Novel Method for Improving Baggage Classification Using a Hyper Model of Fusion of DenseNet-161 and EfficientNet-B5
by Mohammed Ali Saleh, Mohamed Abdouh and Mohamed K. Ramadan
Big Data Cogn. Comput. 2024, 8(10), 135; https://doi.org/10.3390/bdcc8100135 - 11 Oct 2024
Viewed by 889
Abstract
In response to rising concerns over crime rates, there has been an increasing demand for automated video surveillance systems that are capable of detecting human activities involving carried objects. This paper proposes a hyper-model ensemble to classify humans carrying baggage based on the [...] Read more.
In response to rising concerns over crime rates, there has been an increasing demand for automated video surveillance systems that are capable of detecting human activities involving carried objects. This paper proposes a hyper-model ensemble to classify humans carrying baggage based on the type of bags they are carrying. The Fastai framework is leveraged for its computational prowess, user-friendly workflow, and effective data-cleansing capabilities. The PETA dataset is utilized and automatically re-annotated into five classes based on the baggage type, including Carrying Backpack, Carrying Luggage Case, Carrying Messenger Bag, Carrying Nothing, and Carrying Other. The classification task employs two pretrained models, DenseNet-161 and EfficientNet-B5, with a hyper-model ensemble that combines them to enhance accuracy. A “fit-one-cycle” strategy was implemented to reduce the training time and improve accuracy. The proposed hyper-model ensemble has been experimentally validated and compared to existing methods, demonstrating an accuracy of 98.6% that exceeds current approaches in terms of accuracy, macro-F1, and micro-F1. DenseNet-161 and EfficientNet-B5 have achieved accuracy rates of 95.5% and 97.3%, respectively. These findings contribute to expanding research on automated video surveillance systems, and the proposed model holds promise for further development and use in diverse applications. Full article
Show Figures

Figure 1

40 pages, 4095 KiB  
Article
An End-to-End Scene Text Recognition for Bilingual Text
by Bayan M. Albalawi, Amani T. Jamal, Lama A. Al Khuzayem and Olaa A. Alsaedi
Big Data Cogn. Comput. 2024, 8(9), 117; https://doi.org/10.3390/bdcc8090117 - 9 Sep 2024
Viewed by 1031
Abstract
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily [...] Read more.
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily focused on recognizing English text, whereas Arabic text has been underrepresented, and (2) most prior research has adopted separate approaches for scene text localization and recognition, as opposed to one integrated framework. To address these gaps, we propose a novel bilingual end-to-end approach that localizes and recognizes both Arabic and English text within a single natural scene image. Specifically, our approach utilizes pre-trained CNN models (ResNet and EfficientNetV2) with kernel representation for localization text and RNN models (LSTM and BiLSTM) with an attention mechanism for text recognition. In addition, the AraElectra Arabic language model was incorporated to enhance Arabic text recognition. Experimental results on the EvArest, ICDAR2017, and ICDAR2019 datasets demonstrated that our model not only achieves superior performance in recognizing horizontally oriented text but also in recognizing multi-oriented and curved Arabic and English text in natural scene images. Full article
Show Figures

Figure 1

15 pages, 1856 KiB  
Article
DaSAM: Disease and Spatial Attention Module-Based Explainable Model for Brain Tumor Detection
by Sara Tehsin, Inzamam Mashood Nasir, Robertas Damaševičius and Rytis Maskeliūnas
Big Data Cogn. Comput. 2024, 8(9), 97; https://doi.org/10.3390/bdcc8090097 - 25 Aug 2024
Cited by 3 | Viewed by 1453
Abstract
Brain tumors are the result of irregular development of cells. It is a major cause of adult demise worldwide. Several deaths can be avoided with early brain tumor detection. Magnetic resonance imaging (MRI) for earlier brain tumor diagnosis may improve the chance of [...] Read more.
Brain tumors are the result of irregular development of cells. It is a major cause of adult demise worldwide. Several deaths can be avoided with early brain tumor detection. Magnetic resonance imaging (MRI) for earlier brain tumor diagnosis may improve the chance of survival for patients. The most common method of diagnosing brain tumors is MRI. The improved visibility of malignancies in MRI makes therapy easier. The diagnosis and treatment of brain cancers depend on their identification and treatment. Numerous deep learning models are proposed over the last decade including Alexnet, VGG, Inception, ResNet, DenseNet, etc. All these models are trained on a huge dataset, ImageNet. These general models have many parameters, which become irrelevant when implementing these models for a specific problem. This study uses a custom deep-learning model for the classification of brain MRIs. The proposed Disease and Spatial Attention Model (DaSAM) has two modules; (a) the Disease Attention Module (DAM), to distinguish between disease and non-disease regions of an image, and (b) the Spatial Attention Module (SAM), to extract important features. The experiments of the proposed model are conducted on two multi-class datasets that are publicly available, the Figshare and Kaggle datasets, where it achieves precision values of 99% and 96%, respectively. The proposed model is also tested using cross-dataset validation, where it achieved 85% accuracy when trained on the Figshare dataset and validated on the Kaggle dataset. The incorporation of DAM and SAM modules enabled the functionality of feature mapping, which proved to be useful for the highlighting of important features during the decision-making process of the model. Full article
Show Figures

Figure 1

20 pages, 70388 KiB  
Article
Analyzing the Attractiveness of Food Images Using an Ensemble of Deep Learning Models Trained via Social Media Images
by Tanyaboon Morinaga, Karn Patanukhom and Yuthapong Somchit
Big Data Cogn. Comput. 2024, 8(6), 54; https://doi.org/10.3390/bdcc8060054 - 27 May 2024
Viewed by 1057
Abstract
With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to [...] Read more.
With the growth of digital media and social networks, sharing visual content has become common in people’s daily lives. In the food industry, visually appealing food images can attract attention, drive engagement, and influence consumer behavior. Therefore, it is crucial for businesses to understand what constitutes attractive food images. Assessing the attractiveness of food images poses significant challenges due to the lack of large labeled datasets that align with diverse public preferences. Additionally, it is challenging for computer assessments to approach human judgment in evaluating aesthetic quality. This paper presents a novel framework that circumvents the need for explicit human annotation by leveraging user engagement data that are readily available on social media platforms. We propose procedures to collect, filter, and automatically label the attractiveness classes of food images based on their user engagement levels. The data gathered from social media are used to create predictive models for category-specific attractiveness assessments. Our experiments across five food categories demonstrate the efficiency of our approach. The experimental results show that our proposed user-engagement-based attractiveness class labeling achieves a high consistency of 97.2% compared to human judgments obtained through A/B testing. Separate attractiveness assessment models were created for each food category using convolutional neural networks (CNNs). When analyzing unseen food images, our models achieve a consistency of 76.0% compared to human judgments. The experimental results suggest that the food image dataset collected from social networks, using the proposed framework, can be successfully utilized for learning food attractiveness assessment models. Full article
Show Figures

Figure 1

17 pages, 33366 KiB  
Article
Sign-to-Text Translation from Panamanian Sign Language to Spanish in Continuous Capture Mode with Deep Neural Networks
by Alvaro A. Teran-Quezada, Victor Lopez-Cabrera, Jose Carlos Rangel and Javier E. Sanchez-Galan
Big Data Cogn. Comput. 2024, 8(3), 25; https://doi.org/10.3390/bdcc8030025 - 26 Feb 2024
Cited by 1 | Viewed by 2199
Abstract
Convolutional neural networks (CNN) have provided great advances for the task of sign language recognition (SLR). However, recurrent neural networks (RNN) in the form of long–short-term memory (LSTM) have become a means for providing solutions to problems involving sequential data. This research proposes [...] Read more.
Convolutional neural networks (CNN) have provided great advances for the task of sign language recognition (SLR). However, recurrent neural networks (RNN) in the form of long–short-term memory (LSTM) have become a means for providing solutions to problems involving sequential data. This research proposes the development of a sign language translation system that converts Panamanian Sign Language (PSL) signs into text in Spanish using an LSTM model that, among many things, makes it possible to work with non-static signs (as sequential data). The deep learning model presented focuses on action detection, in this case, the execution of the signs. This involves processing in a precise manner the frames in which a sign language gesture is made. The proposal is a holistic solution that considers, in addition to the seeking of the hands of the speaker, the face and pose determinants. These were added due to the fact that when communicating through sign languages, other visual characteristics matter beyond hand gestures. For the training of this system, a data set of 330 videos (of 30 frames each) for five possible classes (different signs considered) was created. The model was tested having an accuracy of 98.8%, making this a valuable base system for effective communication between PSL users and Spanish speakers. In conclusion, this work provides an improvement of the state of the art for PSL–Spanish translation by using the possibilities of translatable signs via deep learning. Full article
Show Figures

Figure 1

14 pages, 3229 KiB  
Article
Hand Gesture Recognition Using Automatic Feature Extraction and Deep Learning Algorithms with Memory
by Rubén E. Nogales and Marco E. Benalcázar
Big Data Cogn. Comput. 2023, 7(2), 102; https://doi.org/10.3390/bdcc7020102 - 23 May 2023
Cited by 12 | Viewed by 6253
Abstract
Gesture recognition is widely used to express emotions or to communicate with other people or machines. Hand gesture recognition is a problem of great interest to researchers because it is a high-dimensional pattern recognition problem. The high dimensionality of the problem is directly [...] Read more.
Gesture recognition is widely used to express emotions or to communicate with other people or machines. Hand gesture recognition is a problem of great interest to researchers because it is a high-dimensional pattern recognition problem. The high dimensionality of the problem is directly related to the performance of machine learning models. The dimensionality problem can be addressed through feature selection and feature extraction. In this sense, the evaluation of a model with manual feature extraction and automatic feature extraction was proposed. The manual feature extraction was performed using the statistical functions of central tendency, while the automatic extraction was performed by means of a CNN and BiLSTM. These features were also evaluated in classifiers such as Softmax, ANN, and SVM. The best-performing model was the combination of BiLSTM and ANN (BiLSTM-ANN), with an accuracy of 99.9912%. Full article
Show Figures

Figure 1

12 pages, 2067 KiB  
Article
Classification of Microbiome Data from Type 2 Diabetes Mellitus Individuals with Deep Learning Image Recognition
by Juliane Pfeil, Julienne Siptroth, Heike Pospisil, Marcus Frohme, Frank T. Hufert, Olga Moskalenko, Murad Yateem and Alina Nechyporenko
Big Data Cogn. Comput. 2023, 7(1), 51; https://doi.org/10.3390/bdcc7010051 - 17 Mar 2023
Cited by 4 | Viewed by 3550
Abstract
Microbiomic analysis of human gut samples is a beneficial tool to examine the general well-being and various health conditions. The balance of the intestinal flora is important to prevent chronic gut infections and adiposity, as well as pathological alterations connected to various diseases. [...] Read more.
Microbiomic analysis of human gut samples is a beneficial tool to examine the general well-being and various health conditions. The balance of the intestinal flora is important to prevent chronic gut infections and adiposity, as well as pathological alterations connected to various diseases. The evaluation of microbiome data based on next-generation sequencing (NGS) is complex and their interpretation is often challenging and can be ambiguous. Therefore, we developed an innovative approach for the examination and classification of microbiomic data into healthy and diseased by visualizing the data as a radial heatmap in order to apply deep learning (DL) image classification. The differentiation between 674 healthy and 272 type 2 diabetes mellitus (T2D) samples was chosen as a proof of concept. The residual network with 50 layers (ResNet-50) image classification model was trained and optimized, providing discrimination with 96% accuracy. Samples from healthy persons were detected with a specificity of 97% and those from T2D individuals with a sensitivity of 92%. Image classification using DL of NGS microbiome data enables precise discrimination between healthy and diabetic individuals. In the future, this tool could enable classification of different diseases and imbalances of the gut microbiome and their causative genera. Full article
Show Figures

Figure 1

Review

Jump to: Research

36 pages, 4696 KiB  
Review
Review of Federated Learning and Machine Learning-Based Methods for Medical Image Analysis
by Netzahualcoyotl Hernandez-Cruz, Pramit Saha, Md Mostafa Kamal Sarker and J. Alison Noble
Big Data Cogn. Comput. 2024, 8(9), 99; https://doi.org/10.3390/bdcc8090099 - 28 Aug 2024
Viewed by 1678
Abstract
Federated learning is an emerging technology that enables the decentralised training of machine learning-based methods for medical image analysis across multiple sites while ensuring privacy. This review paper thoroughly examines federated learning research applied to medical image analysis, outlining technical contributions. We followed [...] Read more.
Federated learning is an emerging technology that enables the decentralised training of machine learning-based methods for medical image analysis across multiple sites while ensuring privacy. This review paper thoroughly examines federated learning research applied to medical image analysis, outlining technical contributions. We followed the guidelines of Okali and Schabram, a review methodology, to produce a comprehensive summary and discussion of the literature in information systems. Searches were conducted at leading indexing platforms: PubMed, IEEE Xplore, Scopus, ACM, and Web of Science. We found a total of 433 papers and selected 118 of them for further examination. The findings highlighted research on applying federated learning to neural network methods in cardiology, dermatology, gastroenterology, neurology, oncology, respiratory medicine, and urology. The main challenges reported were the ability of machine learning models to adapt effectively to real-world datasets and privacy preservation. We outlined two strategies to address these challenges: non-independent and identically distributed data and privacy-enhancing methods. This review paper offers a reference overview for those already working in the field and an introduction to those new to the topic. Full article
Show Figures

Figure 1

Back to TopTop