Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (121)

Search Parameters:
Keywords = method of vector quantization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 880 KB  
Article
FedPLC: Federated Learning with Dynamic Cluster Adaptation for Concept Drift on Non-IID Data
by Qi Zhou, Yantao Yu, Jingxiao Ma, Mohammad S. Obaidat, Xing Chang, Mingchen Ma and Shousheng Sun
Sensors 2026, 26(1), 283; https://doi.org/10.3390/s26010283 - 2 Jan 2026
Viewed by 276
Abstract
In practical deployments of decentralized federated learning (FL) in Internet of Things (IoT) environments, the non-independent and identically distributed (Non-IID) nature of client-local data limits model performance. Furthermore, concept drift further exacerbates complexity and introduces temporal uncertainty that significantly degrades convergence and generalization. [...] Read more.
In practical deployments of decentralized federated learning (FL) in Internet of Things (IoT) environments, the non-independent and identically distributed (Non-IID) nature of client-local data limits model performance. Furthermore, concept drift further exacerbates complexity and introduces temporal uncertainty that significantly degrades convergence and generalization. Existing approaches, which mainly rely on model-level similarity or static clustering, struggle to disentangle inherent data heterogeneity from dynamic distributional shifts, resulting in poor adaptability under drift scenarios. This paper proposes FedPLC, a novel FL framework that introduces two mechanism-level innovations: (i) Prototype-Anchored Representation Learning (PARL), a strategy inspired by Learning Vector Quantization (LVQ) that stabilizes the representation space against label noise and distributional shifts by aligning sample embeddings with class prototypes; and (ii) Label-wise Dynamic Community Adaptation (LDCA), a fine-grained adaptation mechanism that dynamically reorganizes classifier heads at the label level, enabling rapid personalization and drift-aware community evolution. Together, PARL and LDCA enable FedPLC to explicitly disentangle static Non-IID heterogeneity from temporal concept drift, achieving robust and fine-grained adaptation for large-scale IoT/edge client populations. Our experimental results on the Fashion-MNIST, CIFAR-10, and SVHN datasets demonstrate that FedPLC outperforms the state-of-the-art federated learning methods designed for concept drift in both abrupt drift and incremental drift scenarios. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

43 pages, 6158 KB  
Article
A Multi-Fish Tracking and Behavior Modeling Framework for High-Density Cage Aquaculture
by Xinyao Xiao, Tao Liu, Shuangyan He, Peiliang Li, Yanzhen Gu, Pixue Li and Jiang Dong
Sensors 2026, 26(1), 256; https://doi.org/10.3390/s26010256 - 31 Dec 2025
Viewed by 265
Abstract
Multi-fish tracking and behavior analysis in deep-sea cages face two critical challenges: first, the homogeneity of fish appearance and low image quality render appearance-based association unreliable; second, standard linear motion models fail to capture the complex, nonlinear swimming patterns (e.g., turning) of fish, [...] Read more.
Multi-fish tracking and behavior analysis in deep-sea cages face two critical challenges: first, the homogeneity of fish appearance and low image quality render appearance-based association unreliable; second, standard linear motion models fail to capture the complex, nonlinear swimming patterns (e.g., turning) of fish, leading to frequent identity switches and fragmented trajectories. To address these challenges, we propose SOD-SORT, which integrates a Constant Turn-Rate and Velocity (CTRV) motion model within an Extended Kalman Filter (EKF) framework into DeepOCSORT, a recent observation-centric tracker. Through systematic Bayesian optimization of the EKF process noise (Q), observation noise (R), and ReID weighting parameters, we achieve harmonious integration of advanced motion modeling with appearance features. Evaluations on the DeepBlueI validation set show that SOD-SORT attains IDF1 = 0.829 and reduces identity switches by 13% (93 vs. 107) compared to the DeepOCSORT baseline, while maintaining comparable MOTA (0.737). Controlled ablation studies reveal that naive integration of CTRV-EKF with default parameters degrades performance substantially (IDs: 172 vs. 107 baseline), but careful parameter optimization resolves this motion-appearance conflict. Furthermore, we introduce a statistical quantization method that converts variable-length trajectories into fixed-length feature vectors, enabling effective unsupervised classification of normal and abnormal swimming behaviors in both the Fish4Knowledge coral reef dataset and real-world Deep Blue I cage videos. The proposed approach demonstrates that principled integration of advanced motion models with appearance cues, combined with high-quality continuous trajectories, can support reliable behavior modeling for aquaculture monitoring applications. Full article
Show Figures

Figure 1

32 pages, 5285 KB  
Article
Simultaneous Reversible Data Hiding and Quality Enhancement for VQ-Compressed Images via Quality Improvement Codes
by Chun-Hsiu Yeh, Chung-Wei Kuo, Xian-Zhong Lin, Wei-Cheng Shen and Chin-Wei Liao
Electronics 2025, 14(22), 4463; https://doi.org/10.3390/electronics14224463 - 16 Nov 2025
Viewed by 393
Abstract
With the rapid proliferation of digital multimedia in resource-constrained Internet of Things (IoT) environments, there is growing demand for efficient image compression combined with secure data embedding. Existing Vector Quantization (VQ)-based Reversible Data Hiding (RDH) methods prioritize embedding capacity while neglecting reconstruction fidelity, [...] Read more.
With the rapid proliferation of digital multimedia in resource-constrained Internet of Things (IoT) environments, there is growing demand for efficient image compression combined with secure data embedding. Existing Vector Quantization (VQ)-based Reversible Data Hiding (RDH) methods prioritize embedding capacity while neglecting reconstruction fidelity, often introducing noticeable quality degradation in edge regions—unacceptable for high-fidelity applications such as medical imaging and forensic analysis. This paper proposes a lightweight RDH framework with a once-offline trained VQ codebook that simultaneously performs secure data embedding and visual quality enhancement for VQ-compressed images. Quality Improvement Codes (QIC) are generated from pixel-wise residuals between original and VQ-decompressed images and embedded into the VQ index table using a novel Recoding Index Value (RIV) mechanism without additional transmission overhead. Sobel edge detection identifies perceptually sensitive blocks for targeted enhancement. Comprehensive experiments on ten standard test images across multiple resolutions (256 × 256, 512 × 512) and codebook sizes (64–1024) demonstrate Peak Signal-to-Noise Ratio (PSNR) gains of +4 to +5.39 dB and Structural Similarity Index Measure (SSIM) improvements of +4.12% to +9.86%, with embedding capacities approaching 100 Kbits. The proposed approach consistently outperforms existing methods in both image quality and payload capacity while eliminating computational overhead of deep learning models, making it highly suitable for resource-constrained edge devices and real-time multimedia security applications. Full article
Show Figures

Figure 1

21 pages, 3303 KB  
Article
Reference-Vector Removed Product Quantization for Approximate Nearest Neighbor Search
by Yang Wang, Ce Xu and Xueyi Wu
Appl. Sci. 2025, 15(21), 11540; https://doi.org/10.3390/app152111540 - 29 Oct 2025
Viewed by 812
Abstract
This paper proposes a decorrelation scheme based on product quantization, termed Reference-Vector Removed Product Quantization (RvRPQ), for approximate nearest neighbor (ANN) search. The core idea is to capture the redundancy among database vectors by representing them with compactly encoded reference-vectors, which are then [...] Read more.
This paper proposes a decorrelation scheme based on product quantization, termed Reference-Vector Removed Product Quantization (RvRPQ), for approximate nearest neighbor (ANN) search. The core idea is to capture the redundancy among database vectors by representing them with compactly encoded reference-vectors, which are then subtracted from the original vectors to yield residual vectors. We provide a theoretical derivation for obtaining the optimal reference-vectors. This preprocessing step significantly improves the quantization accuracy of the subsequent product quantization applied to the residuals. To maintain low online computational complexity and control memory overhead, we apply vector quantization to the reference-vectors and allocate only a small number of additional bits to store their indices. Experimental results show that RvRPQ substantially outperforms state-of-the-art ANN methods in terms of retrieval accuracy, while preserving high search efficiency. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

25 pages, 4755 KB  
Article
DA-GSGTNet: Dynamic Aggregation Gated Stratified Graph Transformer for Multispectral LiDAR Point Cloud Segmentation
by Qiong Ding, Runyuan Zhang, Alex Hay-Man Ng, Long Tang, Bohua Ling, Dan Wang and Yuelin Hou
Remote Sens. 2025, 17(21), 3515; https://doi.org/10.3390/rs17213515 - 23 Oct 2025
Viewed by 728
Abstract
Multispectral LiDAR point clouds, which integrate both geometric and spectral information, offer rich semantic content for scene understanding. However, due to data scarcity and distributional discrepancies, existing methods often struggle to balance accuracy and efficiency in complex urban environments. To address these challenges, [...] Read more.
Multispectral LiDAR point clouds, which integrate both geometric and spectral information, offer rich semantic content for scene understanding. However, due to data scarcity and distributional discrepancies, existing methods often struggle to balance accuracy and efficiency in complex urban environments. To address these challenges, we propose DA-GSGTNet, a novel segmentation framework that integrates Gated Stratified Graph Transformer Blocks (GSGT-Block) with Dynamic Aggregation Transition Down (DATD). The GSGT-Block employs graph convolutions to enhance the local continuity of windowed attention in sparse neighborhoods and adaptively fuses these features via a gating mechanism. The DATD module dynamically adjusts k-NN strides based on point density, while jointly aggregating coordinates and feature vectors to preserve structural integrity during downsampling. Additionally, we introduce a relative position encoding scheme using quantized lookup tables with a Euclidean distance bias to improve recognition of elongated and underrepresented classes. Experimental results on a benchmark multispectral point cloud dataset demonstrate that DA-GSGTNet achieves 86.43% mIoU, 93.74% mAcc, and 90.78% OA, outperforming current state-of-the-art methods. Moreover, by fine-tuning from source-domain pretrained weights and using only ~30% of the training samples (4 regions) and 30% of the training epochs (30 epochs), we achieve over 90% of the full-training segmentation accuracy (100 epochs). These results validate the effectiveness of transfer learning for rapid convergence and efficient adaptation in data-scarce scenarios, offering practical guidance for future multispectral LiDAR applications with limited annotation. Full article
Show Figures

Figure 1

20 pages, 2197 KB  
Article
Perceptual Image Hashing Fusing Zernike Moments and Saliency-Based Local Binary Patterns
by Wei Li, Tingting Wang, Yajun Liu and Kai Liu
Computers 2025, 14(9), 401; https://doi.org/10.3390/computers14090401 - 21 Sep 2025
Viewed by 988
Abstract
This paper proposes a novel perceptual image hashing scheme that robustly combines global structural features with local texture information for image authentication. The method starts with image normalization and Gaussian filtering to ensure scale invariance and suppress noise. A saliency map is then [...] Read more.
This paper proposes a novel perceptual image hashing scheme that robustly combines global structural features with local texture information for image authentication. The method starts with image normalization and Gaussian filtering to ensure scale invariance and suppress noise. A saliency map is then generated from a color vector angle matrix using a frequency-tuned model to identify perceptually significant regions. Local Binary Pattern (LBP) features are extracted from this map to represent fine-grained textures, while rotation-invariant Zernike moments are computed to capture global geometric structures. These local and global features are quantized and concatenated into a compact binary hash. Extensive experiments on standard databases show that the proposed method outperforms state-of-the-art algorithms in both robustness against content-preserving manipulations and discriminability across different images. Quantitative evaluations based on ROC curves and AUC values confirm its superior robustness–uniqueness trade-off, demonstrating the effectiveness of the saliency-guided fusion of Zernike moments and LBP for reliable image hashing. Full article
Show Figures

Figure 1

20 pages, 285 KB  
Article
The Role of Symmetry Aspects in Considering the Spin-1 Particle with Two Additional Electromagnetic Characteristics in the Presence of Both Magnetic and Electric Fields
by Alina Ivashkevich, Viktor Red’kov, Elena Ovsiyuk and Alexander Chichurin
Symmetry 2025, 17(9), 1465; https://doi.org/10.3390/sym17091465 - 5 Sep 2025
Viewed by 475
Abstract
In this paper, we study a generalized Duffin–Kemmer equation for a spin-1 particle with two characteristics, anomalous magnetic moment and polarizability in the presence of external uniform magnetic and electric fields. After separating the variables, we obtained a system of 10 first-order partial [...] Read more.
In this paper, we study a generalized Duffin–Kemmer equation for a spin-1 particle with two characteristics, anomalous magnetic moment and polarizability in the presence of external uniform magnetic and electric fields. After separating the variables, we obtained a system of 10 first-order partial differential equations for 10 functions fA(r,z). To resolve this complicated problem, we first took into account existing symmetry in the structure of the derived system. The main step consisted of applying a special method for fixing the r-dependence of ten functions fA(r,z),A=1,,10. We used the approach of Fedorov–Gronskiy, according to which the complete 10-component wave function is decomposed into the sum of three projective constituents. The dependence of each component on the polar coordinate r is determined by only one corresponding function, Fi(r),i=1,2,3. These three basic functions are constructed in terms of confluent hypergeometric functions, and in this process a quantization rule arises due to the presence of a magnetic field.In fact, this approach is a step-by-step algebraization of the systems of equations in partial derivatives. After that, we derived a system of 10 ordinary differential equations for 10 functions fA(z). This system was solved using the elimination method and with the help of special linear combinined with the involved functions. As a result, we found three separated second-order differential equations, and their solutions were constructed in the terms of the confluent hypergeometric functions. Thus, in this paper, the three types of solutions for a vector particle with two additional electromagnetic characteristics in the presence of both external uniform magnetic and electric fields. Full article
23 pages, 4446 KB  
Article
A Modular Framework for RGB Image Processing and Real-Time Neural Inference: A Case Study in Microalgae Culture Monitoring
by José Javier Gutiérrez-Ramírez, Ricardo Enrique Macias-Jamaica, Víctor Manuel Zamudio-Rodríguez, Héctor Arellano Sotelo, Dulce Aurora Velázquez-Vázquez, Juan de Anda-Suárez and David Asael Gutiérrez-Hernández
Eng 2025, 6(9), 221; https://doi.org/10.3390/eng6090221 - 2 Sep 2025
Viewed by 908
Abstract
Recent progress in computer vision and embedded systems has facilitated real-time monitoring of bioprocesses; however, lightweight and scalable solutions for resource-constrained settings remain limited. This work presents a modular framework for monitoring Chlorella vulgaris growth by integrating RGB image processing with multimodal sensor [...] Read more.
Recent progress in computer vision and embedded systems has facilitated real-time monitoring of bioprocesses; however, lightweight and scalable solutions for resource-constrained settings remain limited. This work presents a modular framework for monitoring Chlorella vulgaris growth by integrating RGB image processing with multimodal sensor fusion. The system incorporates a Logitech C920 camera and low-cost pH and temperature sensors within a compact photobioreactor. It extracts RGB channel statistics, luminance, and environmental data to generate a 10-dimensional feature vector. A feedforward artificial neural network (ANN) with ReLU activations, dropout layers, and SMOTE-based data balancing was trained to classify growth phases: lag, exponential, and stationary. The optimized model, quantized to 8 bits, was deployed on an ESP32 microcontroller, achieving 98.62% accuracy with 4.8 ms inference time and a 13.48 kB memory footprint. Robustness analysis confirmed tolerance to geometric transformations, though variable lighting reduced performance. Principal component analysis (PCA) retained 95% variance, supporting the discriminative power of the features. The proposed system outperformed previous vision-only methods, demonstrating the advantages of multimodal fusion for early detection. Limitations include sensitivity to lighting and validation limited to a single species. Future directions include incorporating active lighting control and extending the model to multi-species classification for broader applicability. Full article
(This article belongs to the Special Issue Artificial Intelligence for Engineering Applications, 2nd Edition)
Show Figures

Figure 1

17 pages, 1462 KB  
Article
Key Operator Vectorization for LeNet and ResNet Based on Buddy Compiler
by Juncheng Chen, Weiwei Chen and Zhi Cai
Appl. Sci. 2025, 15(17), 9523; https://doi.org/10.3390/app15179523 - 29 Aug 2025
Viewed by 736
Abstract
Deep learning has emerged as a prominent focus in both academia and industry, with a wide range of models being applied across diverse domains. Fast and efficient model inference is essential for the practical deployment of deep learning models. Under specific hardware constraints, [...] Read more.
Deep learning has emerged as a prominent focus in both academia and industry, with a wide range of models being applied across diverse domains. Fast and efficient model inference is essential for the practical deployment of deep learning models. Under specific hardware constraints, accelerating inference remains a key research challenge. Common techniques for model acceleration include quantization, pruning, and vectorization. Although quantization and pruning primarily reduce model precision or complexity to enhance efficiency, this paper concentrates on vectorization, a technique that accelerates models by increasing the parallelism of operator execution. Based on the open-source Buddy-MLIR project, this work implements vectorization optimizations for Matmul, Conv2d, and Max Pooling operations to improve inference performance. These optimizations are designed as compiler passes and integrated into the Buddy-MLIR framework, offering a general solution for vectorizing such operators. Two optimization approaches are proposed: general vectorization and adaptive vectorization. Compared to the standard MLIR lowering pipeline and the fully optimized LLVM backend, the proposed general and adaptive vectorization methods reduce the inference latency of LeNet-5 by 26.7% and 37.3%, respectively. For the more complex ResNet-18 model, these methods achieve latency reductions of 79.9% and 82.6%, respectively. Full article
Show Figures

Figure 1

25 pages, 8472 KB  
Article
Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images
by Emrecan Kutay and Aylin Yener
Entropy 2025, 27(8), 813; https://doi.org/10.3390/e27080813 - 29 Jul 2025
Cited by 1 | Viewed by 1609
Abstract
This paper investigates point-to-point multimodal digital semantic communications in a task-oriented setup, where messages are classified at the receiver. We employ a pre-trained transformer model to extract semantic information and propose three methods for generating semantic codewords. First, we propose semantic quantization that [...] Read more.
This paper investigates point-to-point multimodal digital semantic communications in a task-oriented setup, where messages are classified at the receiver. We employ a pre-trained transformer model to extract semantic information and propose three methods for generating semantic codewords. First, we propose semantic quantization that uses quantized embeddings of source realizations as a codebook. We investigate the fixed-length coding, considering the source semantic structure and end-to-end semantic distortion. We propose a neural network-based codeword assignment mechanism incorporating codeword transition probabilities to minimize the expected semantic distortion. Second, we present semantic compression that clusters embeddings, exploiting the inherent semantic redundancies to reduce the codebook size, i.e., further compression. Third, we introduce a semantic vector-quantized autoencoder (VQ-AE) that learns a codebook through training. In all cases, we follow this semantic source code with a standard channel code to transmit over the wireless channel. In addition to classification accuracy, we assess pre-communication overhead via a novel metric we term system time efficiency. Extensive experiments demonstrate that our proposed semantic source-coding approaches provide comparable accuracy and better system time efficiency compared to their learning-based counterparts. Full article
(This article belongs to the Special Issue Semantic Information Theory)
Show Figures

Figure 1

21 pages, 2467 KB  
Article
Implementation of a Conditional Latent Diffusion-Based Generative Model to Synthetically Create Unlabeled Histopathological Images
by Mahfujul Islam Rumman, Naoaki Ono, Kenoki Ohuchida, Ahmad Kamal Nasution, Muhammad Alqaaf, Md. Altaf-Ul-Amin and Shigehiko Kanaya
Bioengineering 2025, 12(7), 764; https://doi.org/10.3390/bioengineering12070764 - 15 Jul 2025
Viewed by 2565
Abstract
Generative image models have revolutionized artificial intelligence by enabling the synthesis of high-quality, realistic images. These models utilize deep learning techniques to learn complex data distributions and generate novel images that closely resemble the training dataset. Recent advancements, particularly in diffusion models, have [...] Read more.
Generative image models have revolutionized artificial intelligence by enabling the synthesis of high-quality, realistic images. These models utilize deep learning techniques to learn complex data distributions and generate novel images that closely resemble the training dataset. Recent advancements, particularly in diffusion models, have led to remarkable improvements in image fidelity, diversity, and controllability. In this work, we investigate the application of a conditional latent diffusion model in the healthcare domain. Specifically, we trained a latent diffusion model using unlabeled histopathology images. Initially, these images were embedded into a lower-dimensional latent space using a Vector Quantized Generative Adversarial Network (VQ-GAN). Subsequently, a diffusion process was applied within this latent space, and clustering was performed on the resulting latent features. The clustering results were then used as a conditioning mechanism for the diffusion model, enabling conditional image generation. Finally, we determined the optimal number of clusters using cluster validation metrics and assessed the quality of the synthetic images through quantitative methods. To enhance the interpretability of the synthetic image generation process, expert input was incorporated into the cluster assignments. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

17 pages, 1788 KB  
Article
Detection of Double Compression in HEVC Videos Containing B-Frames
by Yoshihisa Furushita, Daniele Baracchi, Marco Fontani, Dasara Shullani and Alessandro Piva
J. Imaging 2025, 11(7), 211; https://doi.org/10.3390/jimaging11070211 - 27 Jun 2025
Viewed by 1140
Abstract
This study proposes a method to detect double compression in H.265/HEVC videos containing B-frames, a scenario underexplored in previous research. The method extracts frame-level encoding features—including frame type, coding unit (CU) size, quantization parameter (QP), and prediction modes—and represents each video as a [...] Read more.
This study proposes a method to detect double compression in H.265/HEVC videos containing B-frames, a scenario underexplored in previous research. The method extracts frame-level encoding features—including frame type, coding unit (CU) size, quantization parameter (QP), and prediction modes—and represents each video as a 28-dimensional feature vector. A bidirectional Long Short-Term Memory (Bi-LSTM) classifier is then trained to model temporal inconsistencies introduced during recompression. To evaluate the method, we created a dataset of 129 HEVC-encoded YUV videos derived from 43 original sequences, covering various bitrate combinations and GOP structures. The proposed method achieved a detection accuracy of 80.06%, outperforming two existing baselines. These results demonstrate the practical applicability of the proposed approach in realistic double compression scenarios. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

35 pages, 4844 KB  
Article
A Transductive Zero-Shot Learning Framework for Ransomware Detection Using Malware Knowledge Graphs
by Ping Wang, Hao-Cyuan Li, Hsiao-Chung Lin, Wen-Hui Lin and Nian-Zu Xie
Information 2025, 16(6), 458; https://doi.org/10.3390/info16060458 - 29 May 2025
Cited by 2 | Viewed by 1696
Abstract
Malware continues to evolve rapidly, posing significant challenges to network security. Traditional signature-based detection methods often struggle to cope with advanced evasion techniques such as polymorphism, metamorphism, encryption, and stealth, which are commonly employed by cybercriminals. As a result, these conventional approaches frequently [...] Read more.
Malware continues to evolve rapidly, posing significant challenges to network security. Traditional signature-based detection methods often struggle to cope with advanced evasion techniques such as polymorphism, metamorphism, encryption, and stealth, which are commonly employed by cybercriminals. As a result, these conventional approaches frequently fail to detect newly emerging malware variants in a timely manner. To address this limitation, Zero-Shot Learning (ZSL) has emerged as a promising alternative, offering improved classification capabilities for previously unseen malware samples. ZSL models leverage auxiliary semantic information and binary feature representations to enhance the recognition of novel threats. This study proposes a Transductive Zero-Shot Learning (TZSL) model based on the Vector Quantized Variational Autoencoder (VQ-VAE) architecture, integrated with a malware knowledge graph constructed from sandbox behavioral analysis of ransomware families. The model is further optimized through hyperparameter tuning to maximize classification performance. Evaluation metrics include per-family classification accuracy, precision, recall, F1-score, and Receiver Operating Characteristic (ROC) curves to ensure robust and reliable detection outcomes. In particular, the harmonic mean (H-mean) metric from the Generalized Zero-Shot Learning (GZSL) framework is introduced to jointly evaluate the model’s performance on both seen and unseen classes, offering a more holistic view of its generalization ability. The experimental results demonstrate that the proposed VQ-VAE model achieves an F1-score of 93.5% in ransomware classification, significantly outperforming other baseline models such as LeNet-5 (65.6%), ResNet-50 (71.8%), VGG-16 (74.3%), and AlexNet (65.3%). These findings highlight the superior capability of the VQ-VAE-based TZSL approach in detecting novel malware variants, improving detection accuracy while reducing false positives. Full article
(This article belongs to the Collection Knowledge Graphs for Search and Recommendation)
Show Figures

Figure 1

30 pages, 34873 KB  
Article
Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation
by Ojonugwa Oluwafemi Ejiga Peter, Opeyemi Taiwo Adeniran, Adetokunbo MacGregor John-Otumu, Fahmi Khalifa and Md Mahmudur Rahman
Algorithms 2025, 18(3), 155; https://doi.org/10.3390/a18030155 - 9 Mar 2025
Cited by 5 | Viewed by 2508
Abstract
The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized [...] Read more.
The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized Generative Adversarial Network (VQ-GAN)) have been used to generate images but not colonoscopy data for intelligent data augmentation. This study developed an effective method for producing synthetic colonoscopy image data, which can be used to train advanced medical diagnostic models for robust colorectal cancer detection and treatment. Text-to-image synthesis was performed using fine-tuned Visual Large Language Models (LLMs). Stable Diffusion and DreamBooth Low-Rank Adaptation produce images that look authentic, with an average Inception score of 2.36 across three datasets. The validation accuracy of various classification models Big Transfer (BiT), Fixed Resolution Residual Next Generation Network (FixResNeXt), and Efficient Neural Network (EfficientNet) were 92%, 91%, and 86%, respectively. Vision Transformer (ViT) and Data-Efficient Image Transformers (DeiT) had an accuracy rate of 93%. Secondly, for the segmentation of polyps, the ground truth masks are generated using Segment Anything Model (SAM). Then, five segmentation models (U-Net, Pyramid Scene Parsing Network (PSNet), Feature Pyramid Network (FPN), Link Network (LinkNet), and Multi-scale Attention Network (MANet)) were adopted. FPN produced excellent results, with an Intersection Over Union (IoU) of 0.64, an F1 score of 0.78, a recall of 0.75, and a Dice coefficient of 0.77. This demonstrates strong performance in terms of both segmentation accuracy and overlap metrics, with particularly robust results in balanced detection capability as shown by the high F1 score and Dice coefficient. This highlights how AI-generated medical images can improve colonoscopy analysis, which is critical for early colorectal cancer detection. Full article
Show Figures

Figure 1

15 pages, 1877 KB  
Article
GraphEPN: A Deep Learning Framework for B-Cell Epitope Prediction Leveraging Graph Neural Networks
by Feng Wang, Xiangwei Dai, Liyan Shen and Shan Chang
Appl. Sci. 2025, 15(4), 2159; https://doi.org/10.3390/app15042159 - 18 Feb 2025
Cited by 3 | Viewed by 2882
Abstract
B-cell epitope prediction is crucial for advancing immunology, particularly in vaccine development and antibody-based therapies. Traditional experimental techniques are hindered by high costs, time consumption, and limited scalability, making them unsuitable for large-scale applications. Computational methods provide a promising alternative, enabling high-throughput screening [...] Read more.
B-cell epitope prediction is crucial for advancing immunology, particularly in vaccine development and antibody-based therapies. Traditional experimental techniques are hindered by high costs, time consumption, and limited scalability, making them unsuitable for large-scale applications. Computational methods provide a promising alternative, enabling high-throughput screening and accurate predictions. However, existing computational approaches often struggle to capture the complexity of protein structures and intricate residue interactions, highlighting the need for more effective models. This study presents GraphEPN, a novel B-cell epitope prediction framework combining a vector quantized variational autoencoder (VQ-VAE) with a graph transformer. The pre-trained VQ-VAE captures both discrete representations of amino acid microenvironments and continuous structural embeddings, providing a comprehensive feature set for downstream tasks. The graph transformer further processes these features to model long-range dependencies and interactions. Experimental results demonstrate that GraphEPN outperforms existing methods across multiple datasets, achieving superior prediction accuracy and robustness. This approach underscores the significant potential for applications in immunodiagnostics and vaccine development, merging advanced deep learning-based representation learning with graph-based modeling. Full article
Show Figures

Figure 1

Back to TopTop