Next-Generation Machine Learning and Deep Learning Models for Complex Data, Vision, and Intelligent Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 January 2027 | Viewed by 4522

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan
Interests: image processing; computer vision; image forensics; fault-tolerant AI; circuit designs

Special Issue Information

Dear Colleagues,

The research community has shown a burgeoning interest in applying machine learning (ML) and deep learning (DL) methodologies to solve complex real-world problems across diverse fields. These advancements in machine learning continuously present new challenges and innovative solutions for a variety of intricate issues in applications, technologies, and theoretical constructs. Deep learning, a critical subset of machine learning, focuses on learning hierarchical representations of input data via multiple non-linear layers. In recent years, DL techniques have seen widespread application, achieving remarkable success across various domains.

In this Special Issue, we invite researchers and practitioners in the fields of machine learning and deep learning to disseminate their original and innovative ideas. We welcome submissions that include both theoretical advancements and practical applications aimed at solving complex data-related challenges using ML and DL algorithms. Topics of interest for this collection include, but are not limited to, the following:

  • Advanced machine learning models and deep learning architectures (e.g., CNN, RNN, LSTM, GNN, Transfer Learning, Attention, and GCN);
  • Novel methods for feature extraction and selection;
  • Image analysis techniques (segmentation, classification, retrieval, and generation);
  • Human action and gesture recognition;
  • Development of intelligent and user-friendly interfaces;
  • Handwriting analysis and recognition;
  • Applications in healthcare and medical image analysis;
  • Bioinformatics and computer vision;
  • Explainable artificial intelligence (XAI).

Dr. Yoichi Tomioka
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning (ML)
  • deep learning (DL)
  • computer vision
  • image segmentation
  • feature extraction
  • explainable AI (XAI)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

29 pages, 1438 KB  
Article
Stability-Driven Feature Extraction–Kolmogorov–Arnold Network-Driven Ensemble Framework for Reliable Breast Cancer Detection
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Electronics 2026, 15(10), 2207; https://doi.org/10.3390/electronics15102207 - 20 May 2026
Viewed by 103
Abstract
Breast cancer screening is a fundamentally probabilistic diagnostic task that requires precise identification of complex imaging characteristics from diverse patient cohorts. Despite improvements in deep learning techniques, current automatic tools are typically trained on well-curated datasets and do not generalize to heterogeneous data, [...] Read more.
Breast cancer screening is a fundamentally probabilistic diagnostic task that requires precise identification of complex imaging characteristics from diverse patient cohorts. Despite improvements in deep learning techniques, current automatic tools are typically trained on well-curated datasets and do not generalize to heterogeneous data, thereby limiting their application. This study aims to address these shortcomings by introducing a more effective and generalizable framework for breast cancer classification that focuses on the stability of features, the learning of complementary representations, and improved decision modeling. The proposed methodology incorporates stability-driven feature extraction (SDFE) with a multi-branch architecture that consists of EfficientNetV2 (Convolutional neural networks (CNNs)), EfficientFormer (Vision transformers (ViTs)), and multi-layer perceptron (MLP)-Mixer models to extract various feature representations. To improve non-linear decision boundaries, it uses a Kolmogorov–Arnold Network (KAN)-based classification head and selects the most credible prediction via an adaptive voting mechanism. This model is trained using patient-level splitting on the VinDr-Mammo dataset, evaluated using five-fold cross-validation, and subsequently externally validated on the CBIS-DDSM dataset. Experimental findings demonstrate the consistent performance of the proposed model, with accuracies of 94.5% in cross-validation, 93.3% on the VinDr-Mammo test set, and 94.6% on CBIS-DDSM, surpassing other recent state-of-the-art solutions. It demonstrates enhanced robustness and cross-dataset generalization, offering a scalable, consistent framework for breast cancer classification that supports the development of computer-aided diagnostic systems. Full article
25 pages, 1425 KB  
Article
Quantitative Evaluation of Personality-Driven Short Dialogue Generation for Game NPCs Based on the Five-Factor Model
by Kanon Sasaki, Sota Kawaguchi, Sakura Miyano and Shun Nishide
Electronics 2026, 15(10), 2030; https://doi.org/10.3390/electronics15102030 - 10 May 2026
Viewed by 332
Abstract
Personality-driven dialogue generation is essential for creating believable non-player characters (NPCs) in games. This study aims to (1) generate short NPC-like dialogue conditioned on predefined personality traits and (2) quantitatively evaluate whether the generated dialogue accurately reflects those traits. To achieve this, we [...] Read more.
Personality-driven dialogue generation is essential for creating believable non-player characters (NPCs) in games. This study aims to (1) generate short NPC-like dialogue conditioned on predefined personality traits and (2) quantitatively evaluate whether the generated dialogue accurately reflects those traits. To achieve this, we propose a framework based on the OCEAN personality model for both controlled dialogue generation and systematic evaluation of personality consistency. We construct 32 personality configurations and generate responses to five scenario-based prompts using three models: Zephyr-7b, OpenChat-3.5-0106, and an Ollama-based OpenLLaMA-3B model. Personality consistency is evaluated using two complementary approaches: classification-based metrics (precision, recall, and F1-score) and score-based aggregation that measures alignment with intended personality traits. In addition, stability is introduced to quantify variability across multiple generated responses. The results suggest that the proposed framework supports a more structured comparison between high- and low-trait configurations within this controlled automated evaluation setting. OpenChat showed the highest performance in the automated evaluation, with F1-scores of 0.893 (high-trait) and 0.900 (low-trait), and the highest aggregated score of 340.94. Zephyr demonstrated strong stability (8.21) and consistent controllability, while the Ollama-based model showed lower consistency (F1: 0.715/0.743, score: 286.99) but substantially faster generation (0.57 s per response). Human validation on a representative subset supported the broad model-level tendency that OpenChat and Zephyr conveyed personality cues more clearly than Ollama, while the difference between OpenChat and Zephyr was less clear in human judgments. Full article
Show Figures

Figure 1

20 pages, 3175 KB  
Article
Multimodal Automatic Music Transcription Using Piano Audio and Hand-Skeleton Information
by Kosuke Yamada, Satoshi Nishimura and Jungpil Shin
Electronics 2026, 15(10), 2005; https://doi.org/10.3390/electronics15102005 - 8 May 2026
Viewed by 347
Abstract
Automatic Music Transcription (AMT) for piano is difficult for audio-only systems due to dense polyphony, resonance, and reverberation, which lead to false positives and unstable onset decisions. We present a multimodal AMT framework that fuses Omnizart audio probability maps with visual cues from [...] Read more.
Automatic Music Transcription (AMT) for piano is difficult for audio-only systems due to dense polyphony, resonance, and reverberation, which lead to false positives and unstable onset decisions. We present a multimodal AMT framework that fuses Omnizart audio probability maps with visual cues from hand-skeleton tracking. A graph-based model called HandSkeletonNet estimates per-key onset probabilities from hand trajectories, and the two modalities are merged via a weighting-and-masking scheme or a compact CNN-based merger. Experiments show consistent improvements over the audio-only baseline on our self-compiled dataset, while evaluations with external datasets primarily improve frame-level sensitivity. The frame-level F1 score improved from 75.12% to 75.76% for the PianoYT dataset and from 54.68% to 57.57% for the PianoVAM dataset compared with the audio-only baseline. Our experiments also reveal limited onset-level gains under domain shift. Remaining errors are largely explained by timing/misalignment and note fragmentation in MIDI decoding, suggesting that robustness to missing hand detections and explicit temporal alignment are key directions. Full article
Show Figures

Figure 1

20 pages, 959 KB  
Article
Skin Cancer Disease Detection Using Two-Stream Hybrid Attention-Based Deep Learning Model
by Abu Saleh Musa Miah, Koki Hirooka, Najmul Hassan and Jungpil Shin
Electronics 2026, 15(8), 1761; https://doi.org/10.3390/electronics15081761 - 21 Apr 2026
Viewed by 611
Abstract
Skin cancer represents a significant public health challenge, necessitating early detection and timely treatment for optimal management. Timely and accurate evaluation of skin lesions is crucial, as delays can lead to more severe outcomes. However, identifying skin lesions accurately can be challenging due [...] Read more.
Skin cancer represents a significant public health challenge, necessitating early detection and timely treatment for optimal management. Timely and accurate evaluation of skin lesions is crucial, as delays can lead to more severe outcomes. However, identifying skin lesions accurately can be challenging due to differences in color, shape, and the various types of imaging equipment used for diagnosis. While recent studies have demonstrated the potential of ensemble convolutional neural networks (CNNs) for early diagnosis of skin disorders, these models are often too large and inefficient for processing contextual information. Although lightweight networks like MobileNetV3 and EfficientNet have been developed to reduce parameters and enable deep neural networks on mobile devices, their performance is limited by inadequate feature representation depth. To mitigate these limitations, we propose a new hybrid attention dual-stream deep learning model for skin lesion detection. Our model uses one training process to preprocess the images and splits the task into two branches. Each branch extracts different features using multi-stage and multi-branch attention techniques, improving the model’s ability to detect skin lesions accurately. The first branch processes the original image using a convolutional layer integrated with three novel attention modules: Enhanced Separable Depthwise Convolution (SCAttn), stage attention, and branch attention. The second branch utilizes Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance the input image, improving local contrast and revealing finer details. The integration of CLAHE with SCAttn modules leverages enhanced local contrast to capture more nuanced features while maintaining computational efficiency. A classification module receives the concatenated hierarchical characteristics that were taken from both branches. Utilizing the PAD2020 and ISIC 2019 datasets, we assessed the proposed model and obtained an accuracy rate of 98.59% for PAD2020, surpassing the state-of-the-art performance by 2%, and stable performance accuracy for the ISIC 2019 dataset. This illustrates how well the model can integrate several attention mechanisms and feature enhancement methods, providing a reliable and effective means of detecting skin cancer. Full article
Show Figures

Figure 1

29 pages, 2340 KB  
Article
Target-Aware Bilingual Stance Detection in Social Media Using Transformer Architecture
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Electronics 2026, 15(4), 830; https://doi.org/10.3390/electronics15040830 - 14 Feb 2026
Viewed by 442
Abstract
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media [...] Read more.
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media ecosystems, where differences in language structure, discourse style, and data availability pose significant challenges for reliable stance modelling. Existing approaches often struggle with target awareness, cross-lingual generalization, robustness to noisy user-generated text, and the interpretability of model decisions. This study aims to build a reliable, explainable target-aware bilingual stance-detection framework that generalizes across heterogeneous stance formats and languages without retraining on a dataset specific to the target language. Thus, a unified dual-encoder architecture based on mDeBERTa-v3 is proposed. Cross-language contrastive learning offers an auxiliary training objective to align English and Arabic stance representations in a common semantic space. Robustness-oriented regularization is used to mitigate the effects of informal language, vocabulary variation, and adversarial noise. To promote transparency and trustworthiness, the framework incorporates token-level rationale extraction, enables fine-grained interpretability, and supports analysis of hallucination. The proposed model is tested on a combined bilingual test set and two structurally distinct zero-shot benchmarks: MT-CSD and AraStance. Experimental results show consistent performance, with accuracies of 85.0% and 86.8% and F1-scores of 84.7% and 86.8% on the zero-shot benchmarks, confirming stable performance and realistic generalization. Ultimately, these findings reveal that effective bilingual stance detection can be achieved via explicit target conditioning, cross-lingual alignment, and explainability-driven design. Full article
Show Figures

Figure 1

19 pages, 3006 KB  
Article
From Quality Grading to Defect Recognition: A Dual-Pipeline Deep Learning Approach for Automated Mango Assessment
by Shinfeng Lin and Hongting Chiu
Electronics 2026, 15(3), 549; https://doi.org/10.3390/electronics15030549 - 27 Jan 2026
Viewed by 450
Abstract
Mango is a high-value agricultural commodity, and accurate and efficient appearance quality grading and defect inspection are critical for export-oriented markets. This study proposes a dual-pipeline deep learning framework for automated mango assessment, in which surface defect classification and quality grading are jointly [...] Read more.
Mango is a high-value agricultural commodity, and accurate and efficient appearance quality grading and defect inspection are critical for export-oriented markets. This study proposes a dual-pipeline deep learning framework for automated mango assessment, in which surface defect classification and quality grading are jointly implemented within a unified inspection system. For defect assessment, the task is formulated as a multi-label classification problem involving five surface defect categories, eliminating the need for costly bounding box annotations required by conventional object detection models. To address the severe class imbalance commonly encountered in agricultural datasets, a copy–paste-based image synthesis strategy is employed to augment scarce defect samples. For quality grading, mangoes are categorized into three quality levels. Unlike conventional CNN-based approaches relying solely on spatial-domain information, the proposed framework integrates decision-level fusion of spatial-domain and frequency-domain representations to enhance grading stability. In addition, image preprocessing is investigated, showing that adaptive contrast enhancement effectively emphasizes surface textures critical for quality discrimination. Experimental evaluations demonstrate that the proposed framework achieves superior performance in both defect classification and quality grading compared with existing detection-based approaches. The proposed classification-oriented system provides an efficient and practical integrated solution for automated mango assessment. Full article
Show Figures

Figure 1

15 pages, 2389 KB  
Article
Diffmap: Enhancement Difference Map for Peripheral Prostate Zone Cancer Localization Based on Functional Data Analysis and Dynamic Contrast Enhancement MRI
by Roman Surkant, Jurgita Markevičiūtė, Ieva Naruševičiūtė, Mantas Trakymas, Povilas Treigys and Jolita Bernatavičienė
Electronics 2026, 15(3), 507; https://doi.org/10.3390/electronics15030507 - 24 Jan 2026
Cited by 1 | Viewed by 480
Abstract
Dynamic contrast-enhancement (DCE) modality of MRI is typically considered secondary in prostate cancer (PCa) diagnostics, due to the common interpretation that its diagnostic power is lower than that of other modalities like T2-weighted (T2W) or diffusion-weighted imaging (DWI). To challenge this paradigm, this [...] Read more.
Dynamic contrast-enhancement (DCE) modality of MRI is typically considered secondary in prostate cancer (PCa) diagnostics, due to the common interpretation that its diagnostic power is lower than that of other modalities like T2-weighted (T2W) or diffusion-weighted imaging (DWI). To challenge this paradigm, this study introduces a novel concept of a difference map, which relies exclusively on DCE-MRI for the localization of peripheral zone prostate cancer using functional data analysis-based (FDA) signal processing. The proposed workflow uses discrete voxel-level DCE time–signal curves that are transformed into a continuous functional form. First-order derivatives are then used to determine patient-specific time points of greatest enhancement change that adapt to the intrinsic characteristics of each patient, producing diffmaps that highlight regions with pronounced enhancement dynamics, indicative of malignancy. A subsequent normalization step accounts for inter-patient variability, enabling consistent interpretation across subjects and probabilistic PCa localization. The approach is validated on a curated dataset of 20 patients. Evaluation of eight workflow variants is performed using weighted log loss, the best variant achieving a mean log loss of 0.578. This study demonstrates the feasibility and effectiveness of a single-modality, automated, and interpretable approach for peripheral prostate cancer localization based solely on DCE-MRI. Full article
Show Figures

Figure 1

24 pages, 742 KB  
Article
Hybrid Poly Commitments for Scalable Binius Zero-Knowledge Proofs in Federated Learning
by Hasina Andriambelo, Hery Zo Andriamanohisoa and Naghmeh Moradpoor
Electronics 2026, 15(3), 500; https://doi.org/10.3390/electronics15030500 - 23 Jan 2026
Viewed by 426
Abstract
Federated learning enables collaborative model training without sharing raw data, but practical deployments increasingly require verifiable guarantees that clients compute updates correctly. Zero-knowledge proofs can provide such guarantees, yet existing approaches face scalability limits due to the combined cost of polynomial commitments and [...] Read more.
Federated learning enables collaborative model training without sharing raw data, but practical deployments increasingly require verifiable guarantees that clients compute updates correctly. Zero-knowledge proofs can provide such guarantees, yet existing approaches face scalability limits due to the combined cost of polynomial commitments and fast Fourier transform (FFT) intensive verification. Pairing-based schemes offer compact proofs but incur high prover and verifier overhead, while hash-based constructions reduce algebraic cost at the expense of rapidly growing proof sizes. This paper proposes Hybrid-Commit, a polynomial commitment architecture for Binius zero-knowledge proofs that aligns cryptographic primitives with the algebraic structure of federated learning workloads. The scheme separates verification into additive and multiplicative phases: linear aggregation is handled using batched additive commitments optimized for binary fields, while non-linear constraints are verified via hash-based commitments over sparsely selected FFT domains. Proofs from multiple clients are combined through recursive aggregation while preserving non-interactivity. Experiments demonstrate scalability in prover time and proof size (near-constant prover time across 4–11 clients; 160 bytes per client representing 341× and 813× reductions vs. FRI-PCS and Orion), although verification time (762 ms per client) does not scale favorably, making the scheme suitable for bandwidth-constrained scenarios. The scheme achieves under 2% end-to-end training overhead with no impact on model accuracy, indicating that workload-aware commitment design can improve specific scalability dimensions of zero-knowledge verification in federated learning systems. Full article
Show Figures

Figure 1

23 pages, 1503 KB  
Article
Hallucination-Aware Interpretable Sentiment Analysis Model: A Grounded Approach to Reliable Social Media Content Classification
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Electronics 2026, 15(2), 409; https://doi.org/10.3390/electronics15020409 - 16 Jan 2026
Viewed by 790
Abstract
Sentiment analysis (SA) has become an essential tool for analyzing social media content in order to monitor public opinion and support digital analytics. Although transformer-based SA models exhibit remarkable performance, they lack mechanisms to mitigate hallucinated sentiment, which refers to the generation of [...] Read more.
Sentiment analysis (SA) has become an essential tool for analyzing social media content in order to monitor public opinion and support digital analytics. Although transformer-based SA models exhibit remarkable performance, they lack mechanisms to mitigate hallucinated sentiment, which refers to the generation of unsupported or overconfident predictions without explicit linguistic evidence. To address this limitation, this study presents a hallucination-aware SA model by incorporating semantic grounding, interpretability-congruent supervision, and neuro-symbolic reasoning within a unified architecture. The proposed model is based on a fine-tuned Open Pre-trained Transformer (OPT) model, using three fundamental mechanisms: a Sentiment Integrity Filter (SIF), a SHapley Additive exPlanations (SHAP)-guided regularization technique, and a confidence-based lexicon-deep fusion module. The experimental analysis was conducted on two multi-class sentiment datasets that contain Twitter (now X) and Reddit posts. In Dataset 1, the suggested model achieved an average accuracy of 97.6% and a hallucination rate of 2.3%, outperforming the current transformer-based and hybrid sentiment models. With Dataset 2, the framework demonstrated strong external generalization with an accuracy of 95.8%, and a hallucination rate of 3.4%, which is significantly lower than state-of-the-art methods. These findings indicate that it is possible to include hallucination mitigation into transformer optimization without any performance degradation, offering a deployable, interpretable, and linguistically complex social media SA framework, which will enhance the reliability of neural systems of language understanding. Full article
Show Figures

Figure 1

Back to TopTop