entropy-logo

Journal Browser

Journal Browser

Application of Information Theory to Computer Vision and Image Processing, 3rd Edition

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: 15 June 2026 | Viewed by 15209

Special Issue Editors


E-Mail Website
Guest Editor
Tecnológico Nacional de México, IT de Mexicali, Mexicali 21376, México
Interests: machine vision; stereo vision; systems laser; scanner control; analogic and digital processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Facultad de Ingeniería, Universidad Autonoma de Baja California, Mexicali 21280, Mexico
Interests: laser scanning; machine vision; remote sensing; support vector machine; measurement error; mobile robotics
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor
Facultad de Ingeniería, Universidad Autónoma de Baja California, Mexicali B.C. 21280, Mexico
Interests: machine vision; stereo vision; systems laser; scanner control; digital image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

We are pleased to announce that due to the great success of the first and second volume of “Application of Information Theory to Computer Vision and Image Processing”, a new Special Issue titled “Application of Information Theory to Computer Vision and Image Processing, 3rd Edition” is now open for the submission of relevant papers of related topics.

The application of information theory to computer vision and image processing has significantly contributed to advancing our understanding and the capabilities of computer science. Mathematical methods are applied to signal and image processing for quantifying and obtaining accurate information with enhanced efficiency upon every innovation. Moreover, they provide valuable tools and techniques for the development of intelligent and adaptive machine vision systems for measuring and analyzing the amount of information contained within a signal and an image. These include the entropy theory, which is used to estimate the average amount of uncertainty or randomness in a dataset, where high entropy indicates a higher level of unpredictability, while low entropy suggests a more predictable and structured dataset.

This Special Issue aims to publish papers on information theory, measurement methods, data processing, tools, and techniques for the design and instrumentation used in machine vision systems via the application of computer vision and image processing for analyzing, processing, and understanding visual data based on the principles of information content, redundancy, and statistical properties.

Dr. Jesús Elías Miranda-Vega
Dr. Wendy Flores-Fuentes
Prof. Dr. Julio Cesar Rodríguez-Quiñonez
Dr. Oleg Sergiyenko
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information theory
  • entropy and coding theory (data compression, watermark, minimizing data loss, visual information in a more compact form, transmission, storage)
  • computer vision (identify relevant features and patterns)
  • machine vision (data analysis and understanding, segmentation, registration, denoising and restoration, object recognition, classification and tracking)
  • cyber-physical systems
  • instrumentation
  • signal and image processing
  • measurements (3D spatial coordinates, redundancy, statistical properties)
  • artificial intelligence
  • applications (navigation, surveillance, facial recognition, medicine, robotics, entertainment, and more)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 31939 KB  
Article
Hierarchical Prototype Alignment for Video Temporal Grounding
by Yun Tian, Xiaobo Guo, Jinsong Wang, Yuming Zhao and Bin Li
Entropy 2026, 28(4), 389; https://doi.org/10.3390/e28040389 - 1 Apr 2026
Viewed by 467
Abstract
Recent advances in vision-language cross-modal learning have substantially improved the performance of video temporal grounding. However, most existing methods directly associate global video features with sentence-level features, overlooking the fact that textual semantics usually correspond to only limited spatio-temporal regions within a video. [...] Read more.
Recent advances in vision-language cross-modal learning have substantially improved the performance of video temporal grounding. However, most existing methods directly associate global video features with sentence-level features, overlooking the fact that textual semantics usually correspond to only limited spatio-temporal regions within a video. This limitation often leads to unstable alignment in complex scenarios involving intertwined events and diverse actions. In essence, accurate video temporal grounding requires the joint modeling of fine-grained spatial semantics and heterogeneous temporal event structures. Motivated by this observation, we propose a hierarchical prototype alignment approach that models cross-modal correspondence between video and text through structured intermediate prototype representations. Specifically, the alignment process is decomposed into two complementary stages: object-phrase alignment and event-sentence alignment. In the object-phrase alignment stage, discriminative local visual regions and informative textual words are aggregated to construct object and phrase prototypes, thereby enhancing fine-grained spatial correspondence at the level of entities and localized actions. In the event-sentence alignment stage, object prototypes are further integrated along the temporal dimension to form event prototypes that represent continuous action units, enabling effective alignment with sentence-level semantics and facilitating the modeling of diverse temporal event structures. On this basis, we further directly inject cross-modal alignment information into candidate moment aggregation. This design allows candidate moment representations to emphasize query-relevant temporal regions. Extensive experiments on Charades-STA, ActivityNet Captions, and TACoS demonstrate that the proposed method outperforms existing approaches, validating the effectiveness of hierarchical prototype alignment for improving both cross-modal alignment quality and temporal grounding accuracy. Full article
Show Figures

Figure 1

23 pages, 2302 KB  
Article
Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition
by Yu Qian, Shucheng Huang and Kai Qu
Entropy 2026, 28(2), 180; https://doi.org/10.3390/e28020180 - 4 Feb 2026
Viewed by 524
Abstract
Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those [...] Read more.
Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)3 and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization. Full article
Show Figures

Figure 1

27 pages, 5343 KB  
Article
A Multi-Feature Fusion-Based Two-Stage Method for Airport Crater Extraction from Remote Sensing Images
by Yalun Zhao, Derong Chen and Jiulu Gong
Entropy 2025, 27(12), 1259; https://doi.org/10.3390/e27121259 - 16 Dec 2025
Viewed by 438
Abstract
The accurate extraction of damage information around airport runways is crucial for the rapid development of subsequent damage effect assessment work and the timely formulation of the ensuing operational plan. However, the presence of dark interference areas such as trees and shadows in [...] Read more.
The accurate extraction of damage information around airport runways is crucial for the rapid development of subsequent damage effect assessment work and the timely formulation of the ensuing operational plan. However, the presence of dark interference areas such as trees and shadows in the background, as well as the increased irregularity at the edge of the crater due to the proximity to the crater, pose challenges to the accurate extraction of the crater area in high entropy images. In this paper, we present a multi-feature fusion-based two-stage method for airport crater extraction from remote sensing images. In stage I, we designed an edge arc segment grouping and matching strategy based on the shape characteristics of craters for preliminary detection. In stage II, we established a crater model based on the regional distribution characteristics of craters and used the marked point processing method for crater detection. In addition, during the step of calculating the magnitude of the edge gradient, we proposed a near-region search strategy, which enhanced the ability of the proposed method to accurately extract craters with irregular shapes. In the test images, the proposed method accurately extracts craters located around and within the runways. Among them, the average recall R and precision P of the proposed method for extracting all craters around the airport runways reached 89% and 87%, respectively, and the average recall R and precision P of the proposed method for extracting craters inside the runways reached 94% and 92%, respectively. Meanwhile, the results of comparative tests showed that our method outperformed other representative algorithms in terms of both crater extraction recall and extraction precision. Full article
Show Figures

Figure 1

20 pages, 26260 KB  
Article
AFMNet: A Dual-Domain Collaborative Network with Frequency Prior Guidance for Low-Light Image Enhancement
by Qianqian An and Long Ma
Entropy 2025, 27(12), 1220; https://doi.org/10.3390/e27121220 - 1 Dec 2025
Viewed by 810
Abstract
Low-light image enhancement (LLIE) degradation arises from insufficient illumination, reflectance occlusion, and noise coupling, and it manifests in the frequency domain as suppressed amplitudes with relatively stable phases. To address the fact that pure spatial mappings struggle to balance brightness enhancement and detail [...] Read more.
Low-light image enhancement (LLIE) degradation arises from insufficient illumination, reflectance occlusion, and noise coupling, and it manifests in the frequency domain as suppressed amplitudes with relatively stable phases. To address the fact that pure spatial mappings struggle to balance brightness enhancement and detail fidelity, whereas pure frequency-domain processing lacks semantic modeling, we propose AFMNet—a dual-domain collaborative enhancement network guided by an information-theoretic frequency prior. This prior regularizes global illumination, while spatial branches restore local details. First, a Multi-Scale Amplitude Estimator (MSAE) adaptively generates fine-grained amplitude-modulation maps via multi-scale fusion, encouraging higher output entropy through adaptive spectral-energy redistribution. Next, a Dual-Branch Spectral–Spatial Attention (DBSSA) module—comprising a Frequency-Modulated Attention Block (FMAB) and a Scale-Variable Depth Attention Block (SVDAB)—is employed: FMAB injects the modulation map as a frequency-domain prior into the attention mechanism to conditionally modulate the amplitude of value features while keeping the phase unchanged, thereby helping to preserve structural information in the enhanced output; SVDAB uses multi-scale depthwise-separable convolutions with scale attention to produce adaptively enhanced spatial features. Finally, a Spectral-Gated Feed-Forward Network (SGFFN) applies learnable spectral filters to local features for band-wise selective enhancement. This collaborative design achieves a favorable balance between illumination correction and detail preservation, and AFMNet delivers state-of-the-art performance on multiple low-light enhancement benchmarks. Full article
Show Figures

Figure 1

17 pages, 765 KB  
Article
Handwritten Digit Recognition with Flood Simulation and Topological Feature Extraction
by Rafał Brociek, Mariusz Pleszczyński, Jakub Błaszczyk, Maciej Czaicki and Christian Napoli
Entropy 2025, 27(12), 1218; https://doi.org/10.3390/e27121218 - 29 Nov 2025
Viewed by 616
Abstract
This paper introduces a novel approach to handwritten digit recognition based on directional flood simulation and topological feature extraction. While traditional pixel-based methods often struggle with noise, partial occlusion, and limited data, our method leverages the structural integrity of digits by simulating water [...] Read more.
This paper introduces a novel approach to handwritten digit recognition based on directional flood simulation and topological feature extraction. While traditional pixel-based methods often struggle with noise, partial occlusion, and limited data, our method leverages the structural integrity of digits by simulating water flow from image boundaries using a modified breadth-first search (BFS) algorithm. The resulting flooded regions capture stroke directionality, spatial segmentation, and closed-area characteristics, forming a compact and interpretable feature vector. Additional parameters such as inner cavities, perimeter estimation, and normalized stroke density enhance classification robustness. For efficient prediction, we employ the Annoy approximate nearest neighbors algorithm using ensemble-based tree partitioning. The proposed method achieves high accuracy on the MNIST (95.9%) and USPS (93.0%) datasets, demonstrating resilience to rotation, noise, and limited training data. This topology-driven strategy enables accurate digit classification with reduced dimensionality and improved generalization. Full article
Show Figures

Figure 1

25 pages, 20019 KB  
Article
GLFNet: Attention Mechanism-Based Global–Local Feature Fusion Network for Micro-Expression Recognition
by Meng Zhang, Long Yao, Wenzhong Yang and Yabo Yin
Entropy 2025, 27(10), 1023; https://doi.org/10.3390/e27101023 - 28 Sep 2025
Cited by 2 | Viewed by 1139
Abstract
Micro-expressions are extremely subtle and short-lived facial muscle movements that often reveal an individual’s genuine emotions. However, micro-expression recognition (MER) remains highly challenging due to its short duration, low motion intensity, and the imbalanced distribution of training samples. To address these issues, this [...] Read more.
Micro-expressions are extremely subtle and short-lived facial muscle movements that often reveal an individual’s genuine emotions. However, micro-expression recognition (MER) remains highly challenging due to its short duration, low motion intensity, and the imbalanced distribution of training samples. To address these issues, this paper proposes a Global–Local Feature Fusion Network (GLFNet) to effectively extract discriminative features for MER. Specifically, GLFNet consists of three core modules: the Global Attention (LA) module, which captures subtle variations across the entire facial region; the Local Block (GB) module, which partitions the feature map into four non-overlapping regions to emphasize salient local movements while suppressing irrelevant information; and the Adaptive Feature Fusion (AFF) module, which employs an attention mechanism to dynamically adjust channel-wise weights for efficient global–local feature integration. In addition, a class-balanced loss function is introduced to replace the conventional cross-entropy loss, mitigating the common issue of class imbalance in micro-expression datasets. Extensive experiments are conducted on three benchmark databases, SMIC, CASME II, and SAMM, under two evaluation protocols. The experimental results demonstrate that under the Composite Database Evaluation protocol, GLFNet consistently outperforms existing state-of-the-art methods in overall performance. Specifically, the unweighted F1-scores on the Combined, SAMM, CASME II, and SMIC datasets are improved by 2.49%, 2.02%, 0.49%, and 4.67%, respectively, compared to the current best methods. These results strongly validate the effectiveness and superiority of the proposed global–local feature fusion strategy in micro-expression recognition tasks. Full article
Show Figures

Figure 1

21 pages, 37484 KB  
Article
Reconstructing Hyperspectral Images from RGB Images by Multi-Scale Spectral–Spatial Sequence Learning
by Wenjing Chen, Lang Liu and Rong Gao
Entropy 2025, 27(9), 959; https://doi.org/10.3390/e27090959 - 15 Sep 2025
Cited by 2 | Viewed by 3271
Abstract
With rapid advancements in transformers, the reconstruction of hyperspectral images from RGB images, also known as spectral super-resolution (SSR), has made significant breakthroughs. However, existing transformer-based methods often struggle to balance computational efficiency with long-range receptive fields. Recently, Mamba has demonstrated linear complexity [...] Read more.
With rapid advancements in transformers, the reconstruction of hyperspectral images from RGB images, also known as spectral super-resolution (SSR), has made significant breakthroughs. However, existing transformer-based methods often struggle to balance computational efficiency with long-range receptive fields. Recently, Mamba has demonstrated linear complexity in modeling long-range dependencies and shown broad applicability in vision tasks. This paper proposes a multi-scale spectral–spatial sequence learning method, named MSS-Mamba, for reconstructing hyperspectral images from RGB images. First, we introduce a continuous spectral–spatial scan (CS3) mechanism to improve cross-dimensional feature extraction of the foundational Mamba model. Second, we propose a sequence tokenization strategy that generates multi-scale-aware sequences to overcome Mamba’s limitations in hierarchically learning multi-scale information. Specifically, we design the multi-scale information fusion (MIF) module, which tokenizes input sequences before feeding them into Mamba. The MIF employs a dual-branch architecture to process global and local information separately, dynamically fusing features through an adaptive router that generates weighting coefficients. This produces feature maps that contain both global contextual information and local details, ultimately reconstructing a high-fidelity hyperspectral image. Experimental results on the ARAD_1k, CAVE and grss_dfc_2018 dataset demonstrate the performance of MSS-Mamba. Full article
Show Figures

Figure 1

23 pages, 7163 KB  
Article
Entropy-Regularized Attention for Explainable Histological Classification with Convolutional and Hybrid Models
by Pedro L. Miguel, Leandro A. Neves, Alessandra Lumini, Giuliano C. Medalha, Guilherme F. Roberto, Guilherme B. Rozendo, Adriano M. Cansian, Thaína A. A. Tosta and Marcelo Z. do Nascimento
Entropy 2025, 27(7), 722; https://doi.org/10.3390/e27070722 - 3 Jul 2025
Cited by 1 | Viewed by 1810
Abstract
Deep learning models such as convolutional neural networks (CNNs) and vision transformers (ViTs) perform well in histological image classification, but often lack interpretability. We introduce a unified framework that adds an attention branch and CAM Fostering, an entropy-based regularizer, to improve Grad-CAM visualizations. [...] Read more.
Deep learning models such as convolutional neural networks (CNNs) and vision transformers (ViTs) perform well in histological image classification, but often lack interpretability. We introduce a unified framework that adds an attention branch and CAM Fostering, an entropy-based regularizer, to improve Grad-CAM visualizations. Six backbone architectures (ResNet-50, DenseNet-201, EfficientNet-b0, ResNeXt-50, ConvNeXt, CoatNet-small) were trained, with and without our modifications, on five H&E-stained datasets. We measured explanation quality using coherence, complexity, confidence drop, and their harmonic mean (ADCC). Our method increased the ADCC in five of the six backbones; ResNet-50 saw the largest gain (+15.65%), and CoatNet-small achieved the highest overall score (+2.69%), peaking at 77.90% on the non-Hodgkin lymphoma set. The classification accuracy remained stable or improved in four models. These results show that combining attention and entropy produces clearer, more informative heatmaps without degrading performance. Our contributions include a modular architecture for both convolutional and hybrid models and a comprehensive, quantitative explainability evaluation suite. Full article
Show Figures

Figure 1

17 pages, 4019 KB  
Article
Oil-Painting Style Classification Using ResNet with Conditional Information Bottleneck Regularization
by Yaling Dang, Fei Duan and Jia Chen
Entropy 2025, 27(7), 677; https://doi.org/10.3390/e27070677 - 25 Jun 2025
Viewed by 1414
Abstract
Automatic classification of oil-painting styles holds significant promise for art history, digital archiving, and forensic investigation by offering objective, scalable analysis of visual artistic attributes. In this paper, we introduce a deep conditional information bottleneck (CIB) framework, built atop ResNet-50, for fine-grained style [...] Read more.
Automatic classification of oil-painting styles holds significant promise for art history, digital archiving, and forensic investigation by offering objective, scalable analysis of visual artistic attributes. In this paper, we introduce a deep conditional information bottleneck (CIB) framework, built atop ResNet-50, for fine-grained style classification of oil paintings. Unlike traditional information bottleneck (IB) approaches that minimize the mutual information I(X;Z) between input X and latent representation Z, our CIB minimizes the conditional mutual information I(X;ZY), where Y denotes the painting’s style label. We implement this conditional term using a matrix-based Rényi’s entropy estimator, thereby avoiding costly variational approximations and ensuring computational efficiency. We evaluate our method on two public benchmarks: the Pandora dataset (7740 images across 12 artistic movements) and the OilPainting dataset (19,787 images across 17 styles). Our method outperforms the prevalent ResNet with a relative performance gain of 13.1% on Pandora and 11.9% on OilPainting. Beyond quantitative gains, our approach yields more disentangled latent representations that cluster semantically similar styles, facilitating interpretability. Full article
Show Figures

Figure 1

35 pages, 1553 KB  
Article
Efficient Learning-Based Robotic Navigation Using Feature-Based RGB-D Pose Estimation and Topological Maps
by Eder A. Rodríguez-Martínez, Jesús Elías Miranda-Vega, Farouk Achakir, Oleg Sergiyenko, Julio C. Rodríguez-Quiñonez, Daniel Hernández Balbuena and Wendy Flores-Fuentes
Entropy 2025, 27(6), 641; https://doi.org/10.3390/e27060641 - 15 Jun 2025
Viewed by 3687
Abstract
Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological [...] Read more.
Robust indoor robot navigation typically demands either costly sensors or extensive training data. We propose a cost-effective RGB-D navigation pipeline that couples feature-based relative pose estimation with a lightweight multi-layer-perceptron (MLP) policy. RGB-D keyframes extracted from human-driven traversals form nodes of a topological map; edges are added when visual similarity and geometric–kinematic constraints are jointly satisfied. During autonomy, LightGlue features and SVD give six-DoF relative pose to the active keyframe, and the MLP predicts one of four discrete actions. Low visual similarity or detected obstacles trigger graph editing and Dijkstra replanning in real time. Across eight tasks in four Habitat-Sim environments, the agent covered 190.44 m, replanning when required, and consistently stopped within 0.1 m of the goal while running on commodity hardware. An information-theoretic analysis over the Multi-Illumination dataset shows that LightGlue maximizes per-second information gain under lighting changes, motivating its selection. The modular design attains reliable navigation without metric SLAM or large-scale learning, and seamlessly accommodates future perception or policy upgrades. Full article
Show Figures

Figure 1

Back to TopTop