MDPI - Publisher of Open Access Journals

29 pages, 8082 KB

Open AccessArticle

CMYD-SurfaceNet: Scale-Aware Cascaded Multimodal MRI Segmentation via Representation-Level Structural Decoupling and Boundary-Constrained Learning

by Chaymae El Mechal, Mostefa Mesbah, Loubna Mazgouti, Fatima Zahra Ammor and Najiba El Amrani El Idrissi

Digital 2026, 6(2), 49; https://doi.org/10.3390/digital6020049 - 16 Jun 2026

Viewed by 209

Abstract

Reliable delineation of brain tumor boundaries in multimodal magnetic resonance imaging (MRI) remains challenging despite substantial advances in deep learning–based segmentation. Although modern encoder–decoder architectures achieve strong volumetric overlap, precise geometric alignment of tumor contours remains inconsistent, particularly for small lesions and heterogeneous [...] Read more.

Reliable delineation of brain tumor boundaries in multimodal magnetic resonance imaging (MRI) remains challenging despite substantial advances in deep learning–based segmentation. Although modern encoder–decoder architectures achieve strong volumetric overlap, precise geometric alignment of tumor contours remains inconsistent, particularly for small lesions and heterogeneous clinical cases. In neuro-oncology, even minor boundary deviations may influence surgical planning, radiotherapy targeting, and longitudinal treatment assessment. These limitations suggest that segmentation performance is not determined solely by network depth or loss design, but also by how multimodal information is structured prior to learning. We introduce CMYD-SurfaceNet, a scale-aware cascaded framework that restructures multimodal MRI inputs at the representation level to enhance boundary-sensitive segmentation. Rather than treating modalities as independently concatenated channels, selected sequences are first organized into a task-guided pseudo-RGB projection. This intermediate representation is subsequently transformed into the CMYK color space to disentangle shared luminance structure from modality-specific contrast dominance. To further encode geometric priors, a gradient-derived boundary density channel is incorporated to explicitly emphasize spatial discontinuities corresponding to tumor margins. The resulting CMYD representation is integrated within a two-stage nnU-Net cascade, where global tumor localization is followed by high-resolution region-of-interest refinement with auxiliary contour supervision. This scale-aware design improves sensitivity to small tumor components while stabilizing contour delineation. Extensive evaluation on the BraTS benchmark demonstrates consistent improvements in boundary-sensitive metrics. Compared with baseline nnU-Net, the proposed framework reduces HD95 from 3.6 mm to 2.4 mm and increases Surface Dice at 1 mm tolerance from 0.82 to 0.89, while maintaining competitive Dice performance. These findings suggest that representation-level structural decoupling, when combined with scale-aware refinement, may provide clinically relevant boundary-aware multimodal MRI segmentation support without increasing architectural complexity. Full article

(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications: 2nd Edition)

► Show Figures

Figure 1

21 pages, 7368 KB

Open AccessArticle

IA4CACAO: Deep Learning-Based Classification of Fermented Cocoa Beans (Cut Test Images) in Colombia

by Ariolfo Camacho Velasco, Ramiro S. Avila Chacón, Diego A. Zárate, Lucero G. Rodriguez Silva, German A. Estrada-Bonilla and Cesar A. Vargas

AgriEngineering 2026, 8(6), 206; https://doi.org/10.3390/agriengineering8060206 - 27 May 2026

Viewed by 408

Abstract

Automated and objective grading of cocoa (Theobroma cacao L.) fermentation remains a major challenge because the conventional cut test relies on subjective visual inspection and is difficult to scale. In this study, we develop and evaluate a deep learning pipeline for classifying [...] Read more.

Automated and objective grading of cocoa (Theobroma cacao L.) fermentation remains a major challenge because the conventional cut test relies on subjective visual inspection and is difficult to scale. In this study, we develop and evaluate a deep learning pipeline for classifying cocoa bean fermentation levels from expert-annotated cut-test images acquired under controlled conditions, enabling the systematic evaluation and comparison of multiple convolutional and transformer-based architectures under consistent preprocessing, training, and evaluation protocols. The dataset comprises 4347 segmented cocoa bean images distributed across four severely imbalanced classes, namely fermented, under-fermented, slaty, and violet. Representative architectures, including EfficientNet-B0, MobileNetV3-Large, ConvNeXt-XLarge, ViT-Base, and ViT-Large, are benchmarked to analyze the effects of class imbalance, RGB versus HSV color representation, training duration, and label-space formulation. The results show that severe class imbalance strongly degrades performance in direct four-class classification. A hierarchical binary-to-multiclass strategy significantly improves balanced recognition by separating fermented from unfermented beans prior to subclass discrimination, increasing macro-F1 scores from approximately 80–83% to 89–91%. Among the evaluated models, ViT-Base emerges as the most stable architecture across experimental settings and offers the best balance between classification performance, training stability, and computational cost. Although larger models achieve slightly higher peak performance under balanced conditions, ViT-Base provides more consistent results under realistic constraints. The proposed framework enables near-real-time inference on segmented single-bean images and supports objective, reproducible, and scalable fermentation assessment. These findings demonstrate that performance in cocoa fermentation grading is determined not only by model capacity, but also by imbalance-aware label-space design and evaluation protocols aligned with real-world cut-test conditions. Full article

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

► Show Figures

Figure 1

20 pages, 13558 KB

Open AccessArticle

Deep Hybrid Synesthesia Model for Audio-Image Transfer

by Zhaojie Luo, Jiayong Jiang and Ladóczki Bence

Electronics 2026, 15(10), 2218; https://doi.org/10.3390/electronics15102218 - 21 May 2026

Viewed by 328

Abstract

Most artistic expressions are conveyed through images (e.g., painting) and audio (e.g., music), and deep learning has been successfully applied to neural style transfer within each of these modalities. However, there is still a lack of deep models that explicitly learn to transfer [...] Read more.

Most artistic expressions are conveyed through images (e.g., painting) and audio (e.g., music), and deep learning has been successfully applied to neural style transfer within each of these modalities. However, there is still a lack of deep models that explicitly learn to transfer style between images and audio. Motivated by synesthesia, which reflects intrinsic connections between vision and hearing in the human brain, we propose a deep hybrid synesthesia model for audio–image style transfer. Our framework consists of two main components: (1) a component conversion module that learns cross-modal mappings between audio rhythm/spectrum and image color/shape in a continuous valence–arousal (VA) emotion space; and (2) a style conversion module that transfers high-level artistic styles between Eastern (ink-wash, shui-mo) and Western painting and their corresponding musical counterparts. We first learn emotion-aware feature networks that align low-level audio and visual components based on shared affective representations, and then model long-term stylistic structures for cross-modal style transfer. Experiments include “seeing the sound” (audio-to-image generation with controllable components) and full audio–image style transformations. Both objective analyses and subjective evaluations suggest that our model can produce cross-modal artworks whose perceived style and emotional content are consistent with human synesthetic impressions. Full article

(This article belongs to the Special Issue AI-Driven Image Generation: Algorithms, Architectures, and Applications)

► Show Figures

Figure 1

17 pages, 3484 KB

Open AccessArticle

Environmental Preference as a Mediator of Streetscape Vitality: A Chain Mediation Model for Landscape Design

by Tiean Zou, Yutong Zhang, Wenbo Duan, Yuhao Liu, Xin Meng, Yuexin Zhang and Xingyuan Fu

Land 2026, 15(5), 846; https://doi.org/10.3390/land15050846 - 14 May 2026

Viewed by 262

Abstract

As the inner driving factor of space vitality, environmental perception can be expressed in many ways. Given the current lack of in-depth research on related perceptions, the study integrated theoretical origin and empirical study methods to clarify the role that preference played as [...] Read more.

As the inner driving factor of space vitality, environmental perception can be expressed in many ways. Given the current lack of in-depth research on related perceptions, the study integrated theoretical origin and empirical study methods to clarify the role that preference played as the common foundation of different expression ways of environmental perception. The study also explored the interaction mechanism of different preference expression ways in the “quality-to-vitality” pathway and significant environmental characteristics of them, so as to realize the transformation from landscape design to urban vitality. Key findings indicate that: (1) Three environmental preference expressions—emotion, satisfaction, and behavioral preference—collectively lend credence to a significant chain mediation pathway (“emotion → satisfaction → behavioral preference”) in the quality-to-vitality process; (2) Pedestrian safety infrastructure (e.g., traffic barricades, well-maintained pavements) could ensure perceived security and walking activities; (3) Cultural/recreational facilities mean complementary legibility-enhancing elements (appropriate spatial enclosure, pleasant color schemes, architectural coherence) to evoke positive affect; (4) Streetscape diversity and visual interest might mitigate monotony induced by excessive block length, serving as vital vitality catalysts in some degree. Full article

► Show Figures

Figure 1

20 pages, 19314 KB

Open AccessArticle

Haptic and Thermal Rendering of Astronomical Data: A Multimodal Approach to Inclusive Science Communication

by Beatriz García, Johanna Casado and Alexis Mancilla

Multimodal Technol. Interact. 2026, 10(5), 54; https://doi.org/10.3390/mti10050054 - 12 May 2026

Viewed by 415

Abstract

Universal Accessibility in Astronomy requires a paradigm shift from visual-centric communication to multisensory data interaction. Because astronomy communication relies inherently on high-resolution imagery and visual metaphors, it creates significant accessibility barriers for blind and low-vision (BLV) audiences. To address this, multimodal encoding offers [...] Read more.

Universal Accessibility in Astronomy requires a paradigm shift from visual-centric communication to multisensory data interaction. Because astronomy communication relies inherently on high-resolution imagery and visual metaphors, it creates significant accessibility barriers for blind and low-vision (BLV) audiences. To address this, multimodal encoding offers a feasible and meaningful solution by redistributing information across alternative sensory channels, ensuring that the absence of sight does not preclude the comprehension of spatial data. This article explores the development and evaluation of a low-cost, multimodal tool designed to represent complex astronomical concepts—specifically stellar magnitude and color—through tactile and auditory stimuli. Unlike traditional methods, our approach focuses on the haptic-cognitive link, allowing users to “feel” data through physical relief models. We present a structured impact study involving a heterogeneous group of blind, low-vision, and sighted participants. The methodology followed a mixed-methods approach, including a participatory workshop with 20 individuals and a detailed usability assessment with a core group (n= 6) of blind and low-vision participants. Preliminary results from this pilot phase demonstrate that multimodal integration effectively reduces the perceived mental effort for complex spatial data comprehension. Quantitative and qualitative feedback suggests that tactile-auditory sensory substitution not only improves accessibility but also enhances engagement and information retention across all user groups. These findings highlight the potential of multimodal models in transforming public scientific environments, such as museums and observatories, into inclusive, interactive spaces. Full article

► Show Figures

Figure 1

22 pages, 19098 KB

Open AccessArticle

Symmetry Analysis of Aesthetic Features for Computational Support in Assessment of Art Learning Outcomes

by Yan Ruan and Xiaofei Li

Symmetry 2026, 18(5), 811; https://doi.org/10.3390/sym18050811 - 9 May 2026

Viewed by 259

Abstract

The assessment of art learning outcomes has long relied on teachers’ subjective judgment, facing challenges such as inconsistent evaluation criteria and difficulty in multi-dimensional quantitative analysis. To address these issues, this study proposes a framework for the automatic assessment of art learning outcomes [...] Read more.

The assessment of art learning outcomes has long relied on teachers’ subjective judgment, facing challenges such as inconsistent evaluation criteria and difficulty in multi-dimensional quantitative analysis. To address these issues, this study proposes a framework for the automatic assessment of art learning outcomes based on symmetry analysis of multi-dimensional aesthetic features. The model quantifies the symmetry between student works and instructional exemplars across three aesthetic dimensions: color distribution features (HSV color space histograms and dominant color composition), compositional features (visual center distribution and structural symmetry), and art movement style features (multi-layer Gram matrices from VGG-19 with PCA dimensionality reduction). Using publicly available artwork datasets, this study constructed Temporal Evolution Pairs (early and late works by the same artist) and Stylistic Inheritance Pairs (works by different artists within the same movement) to validate the model’s effectiveness. The experimental results demonstrate that the proposed multi-dimensional feature fusion strategy achieves 87.6% accuracy in artist style evolution trajectory recognition and 82.3% accuracy in art movement style inheritance quantification, significantly outperforming baseline methods including SSIM (52.3%), VGG-fc features (68.9%), and single style loss (76.4%). Two in-depth case studies further validate the model’s quantitative capability: in analyzing Picasso’s stylistic evolution, the Mastery Index and the Creativity Divergence Index successfully captured the stylistic continuity of adjacent periods (Blue Period to Rose Period: the Mastery Index = 73.6) and the breakthrough innovation of cross-period transformations (Rose Period to Cubism: the Creativity Divergence Index = 82.7). t-SNE visualization of the feature space further revealed that deep style features can clearly distinguish different art movements and individual artists, with spatial distances between artists closely corresponding to stylistic affinities. This research provides new perspectives and tools for a computational framework with the potential for art education assessment practice. It is important to emphasize that the reported performance demonstrates the model’s ability to quantify stylistic relationships between artworks but does not yet demonstrate its validity for assessing student learning outcomes in real classroom settings. As noted, the current validation is based on art-historical consensus, and direct application to educational contexts will require further validation with actual student works and expert evaluation, which we plan to address in future work. Full article

► Show Figures

Figure 1

25 pages, 3616 KB

Open AccessFeature PaperArticle

Simultaneous Decompositions of Two Sets of Five Quaternion Tensors and Applications in Color Videos Processing

by Zhuo-Heng He, Yu-Fei Jiang, Mei-Ling Deng and Shao-Wen Yu

Mathematics 2026, 14(9), 1558; https://doi.org/10.3390/math14091558 - 5 May 2026

Viewed by 325

Abstract

This paper extends the theory of equivalence canonical forms from quaternion matrices to quaternion tensors under the Einstein product. Motivated by recent results on the simultaneous decomposition of two specific configurations of five quaternion matrices, we establish a comprehensive framework for the corresponding [...] Read more.

This paper extends the theory of equivalence canonical forms from quaternion matrices to quaternion tensors under the Einstein product. Motivated by recent results on the simultaneous decomposition of two specific configurations of five quaternion matrices, we establish a comprehensive framework for the corresponding configurations of five quaternion tensors. The core approach leverages bijective transformation maps that establish isomorphisms between quaternion tensor spaces and matrix spaces, allowing us to systematically construct invertible transformation tensors that simultaneously reduce the given tensor quintuples to canonical forms consisting solely of binary entries (0 and 1). A detailed structural analysis of the resulting canonical tensor forms is provided, including explicit dimension formulas for all identity blocks derived from precise rank conditions. To demonstrate practical utility, we integrate the proposed tensor decomposition with the discrete wavelet transform to construct a color video encryption and decryption system. Experimental results confirm perfect reconstruction (PSNR exceeding 300 dB, SSIM equal to 1) and strong security performance: NPCR of 49.8%, UACI of 49.6%, information entropy of 0.9986 bits per pixel, adjacent pixel correlation below 0.03 in absolute value, and a key space exceeding 2⁵¹². The developed theory significantly extends the existing literature on quaternion tensor decompositions and provides powerful tools for multidimensional signal processing. Full article

► Show Figures

Figure 1

24 pages, 598 KB

Open AccessArticle

Color Transformations Resulting in Loss of Performance in Modern Video Compression Software Systems

by Marek Domański, Adam Grzelka and Olgierd Stankiewicz

Information 2026, 17(4), 366; https://doi.org/10.3390/info17040366 - 13 Apr 2026

Viewed by 339

Abstract

Modern video compression is implemented in complex software systems that reuse software modules from various sources. This is particularly evident in experimental software systems designed for researching and standardizing new compression technologies. These systems often incorporate software modules operating in different color spaces. [...] Read more.

Modern video compression is implemented in complex software systems that reuse software modules from various sources. This is particularly evident in experimental software systems designed for researching and standardizing new compression technologies. These systems often incorporate software modules operating in different color spaces. For example, AI-based techniques are often used in video coding experiments. The corresponding software modules often operate on RGB representations, while other modules operate on YC_BC_R components. In this study, we demonstrate that the quality loss resulting from color transformations is comparable to the respective quantization noise. Consecutive cycles of color transformations do not result in significant additional degradation. However, for image compression, very different results are obtained in different color representations. This aspect must be carefully considered in compression research. This paper supports these considerations with extensive experimental results in the context of ITU Recommendations BT.709 and BT.2020, as well as AVC and HEVC compression. Full article

(This article belongs to the Special Issue Signal Processing and Machine Learning, 2nd Edition)

► Show Figures

Figure 1

17 pages, 2594 KB

Open AccessArticle

Dunhuang Mural Style Transfer Using Vision Mamba: In-Context Prompting and Physically Motivated HSV Modulation

by Peijun Qin, Long Liu, Hongjuan Wang, Siyuan Ma, Cui Chen, Zixuan Han and Mingzhi Cheng

Electronics 2026, 15(8), 1578; https://doi.org/10.3390/electronics15081578 - 9 Apr 2026

Viewed by 435

Abstract

Digital stylization of Dunhuang murals can support cultural heritage revitalization by transferring their distinctive aesthetics to modern images, but existing methods face practical limitations. Transformer-based models can yield high visual quality, but often at a prohibitive computational cost. In contrast, standard state space [...] Read more.

Digital stylization of Dunhuang murals can support cultural heritage revitalization by transferring their distinctive aesthetics to modern images, but existing methods face practical limitations. Transformer-based models can yield high visual quality, but often at a prohibitive computational cost. In contrast, standard state space models (SSMs) are more efficient but tend to incur issues such as semantic loss, inconsistent stylization, and an undesired coupling between color and structure when processing the complex textures of historical murals. To address these issues, we propose Dh-Mamba, a hierarchical visual Mamba framework tailored for high-fidelity Dunhuang mural style transfer. Dh-Mamba introduces a CrossMamba in-context style injection mechanism. This mechanism prefixes the style token sequence to the content sequence, which enables globally consistent style propagation as a persistent memory and retains linear-time efficiency. We also designed two additional components: a Modulated Style Perception Module (Δt) and an Orthogonal Decoupled HSV Modulator. The former adaptively regulates texture injection based on style complexity. The latter models mineral pigment palettes and mitigates oxidation-related artifacts by disentangling hue, saturation, and value. Experiments on a custom Dunhuang dataset show that Dh-Mamba improves content preservation and produces more natural mural textures than recent state-of-the-art methods; multiple quantitative metrics corroborate these gains. With 20.04 million parameters, Dh-Mamba provides a resource-efficient solution suitable for deployment in resource-constrained terminal applications for cultural heritage preservation Full article

(This article belongs to the Special Issue AI-Driven Image Generation: Algorithms, Architectures, and Applications)

► Show Figures

Figure 1

28 pages, 105542 KB

Open AccessArticle

Underwater Image Enhancement via HSV-CS Representation and Perception-Driven Adaptive Fusion

by Fengxu Guan, Tong Guo and Yuzhu Zhang

Remote Sens. 2026, 18(7), 986; https://doi.org/10.3390/rs18070986 - 25 Mar 2026

Viewed by 725

Abstract

Underwater images often suffer from color distortion and low contrast, severely limiting the reliability of visual perception systems. Existing methods struggle to balance enhancement quality and computational efficiency. To address this issue, we propose PCF-Net (Perception-driven Color Fusion Network), a lightweight dual-branch network [...] Read more.

Underwater images often suffer from color distortion and low contrast, severely limiting the reliability of visual perception systems. Existing methods struggle to balance enhancement quality and computational efficiency. To address this issue, we propose PCF-Net (Perception-driven Color Fusion Network), a lightweight dual-branch network for underwater image enhancement based on a stable HSV-CS (Hue-Saturation-Value with sine–cosine transformation) color-space representation. Specifically, a sine–cosine transformation is introduced to construct a stable HSV-CS color space, effectively avoiding hue discontinuities at boundary regions in conventional HSV representations. To compensate for underwater degradation, a Color-Bias-Aware module and a Value-Confidence module are designed to adaptively correct color distortion and luminance degradation. Furthermore, a lightweight Channel-Spatial Adaptive Gated Fusion module dynamically aggregates features from the RGB and HSV-CS branches in a perception-driven manner. The overall architecture incorporates multi-branch re-parameterizable convolutions, significantly reducing computational cost while preserving strong representational capacity. Extensive experiments on underwater image enhancement benchmarks, including UIEB and RUIE, demonstrate that PCF-Net achieves state-of-the-art performance in terms of PSNR, SSIM, and UIQM, along with visually superior color correction and contrast enhancement. With only 0.17 M parameters, the proposed model runs at 118.6 FPS on an RTX 3090 and 35.3 FPS on a Jetson Orin Nano at a resolution of 512 × 512, making it well suited for resource-constrained real-time underwater vision applications. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing Image Enhancement)

► Show Figures

Figure 1

23 pages, 7102 KB

Open AccessArticle

Detection of Uniform Corrosion in Steel Pipes Using a Mobile Artificial Vision System

by Rafael Antonio Rodríguez Ospino, Cristhian Manuel Durán Acevedo and Jeniffer Katerine Carrillo Gómez

Corros. Mater. Degrad. 2026, 7(1), 21; https://doi.org/10.3390/cmd7010021 - 20 Mar 2026

Viewed by 1163

Abstract

Corrosion in steel pipelines can cause critical failures in industrial systems, while conventional inspection methods such as radiography and ultrasonic testing are costly and require specialized personnel. This study presents a mobile computer vision system for automated corrosion detection inside steel pipes using [...] Read more.

Corrosion in steel pipelines can cause critical failures in industrial systems, while conventional inspection methods such as radiography and ultrasonic testing are costly and require specialized personnel. This study presents a mobile computer vision system for automated corrosion detection inside steel pipes using deep learning-based visual analysis. The proposed system consists of a Raspberry Pi 4-based mobile robot equipped with a high-resolution camera for internal inspection. Acquired images were processed using color-space transformations (RGB–HSV), filtering, and segmentation. Convolutional neural networks and semantic segmentation models, including YOLOv8-seg (Instance segmentation) and DeepLabV3 (Semantic segmentation), were trained on a custom corrosion image dataset to identify corroded regions. Real-time visualization was implemented via Flask-based video streaming. Experimental results demonstrated high detection accuracy for uniform corrosion, achieving a mean Intersection over Union (mIoU) above 0.98 and a precision of 0.99 with the YOLOv8-seg model. These results indicate that the proposed system enables reliable and automated corrosion inspection, with the potential to reduce inspection costs and improve operational efficiency. Future work will focus on enhancing real-time performance through hardware optimization. Full article

► Show Figures

Figure 1

23 pages, 12225 KB

Open AccessArticle

Stain-Standardized Deep Learning Framework for Robust Leukocyte Segmentation Across Heterogeneous Cytological Datasets

by Leila Ryma Lazouni, Mourtada Benazzouz, Fethallah Hadjila, Mohammed El Amine Lazouni and Mostafa El Habib Daho

Information 2026, 17(3), 262; https://doi.org/10.3390/info17030262 - 5 Mar 2026

Viewed by 684

Abstract

Accurate leukocyte segmentation remains challenging in automated hematological analysis due to staining variability, heterogeneous imaging conditions, and morphological diversity across cytological datasets, severely limiting deep learning model generalization. This work proposes a dual-module framework designed to achieve stain-invariant and robust leukocyte segmentation. The [...] Read more.

Accurate leukocyte segmentation remains challenging in automated hematological analysis due to staining variability, heterogeneous imaging conditions, and morphological diversity across cytological datasets, severely limiting deep learning model generalization. This work proposes a dual-module framework designed to achieve stain-invariant and robust leukocyte segmentation. The first module performs explicit stain standardization by combining a VGG-based encoder, a transformer bottleneck, and a convolutional decoder to harmonize diverse inputs toward a Wright–Giemsa reference appearance. The second module introduces a multi-encoder segmentation architecture integrating complementary spatial, leukocyte-specific, and nucleus-focused representations extracted from multiple color spaces. The framework is evaluated on six public and clinical datasets covering multiple staining protocols, magnifications, and imaging scenarios. Experimental results demonstrate consistent high performance, with Dice coefficients exceeding 96% on most datasets and systematic improvements over state-of-the-art methods. Extensive ablation studies confirm the synergistic contributions of stain-standardization and multi-encoder fusion to model robustness and cross-dataset generalization. This framework overcomes stain variability and domain shift, offering a practical tool for automated leukocyte analysis in clinical settings. Full article

(This article belongs to the Special Issue Advanced AI and Data-Driven Learning Methods for Healthcare Applications)

► Show Figures

Graphical abstract

26 pages, 30049 KB

Open AccessArticle

HVIFormer: A Dual-Stage Low-Light Image Enhancement Method Based on HVI Representation

by Yimei Li, Liuhong Luo and Hongjun Li

Appl. Sci. 2026, 16(5), 2450; https://doi.org/10.3390/app16052450 - 3 Mar 2026

Viewed by 778

Abstract

Low-light image enhancement improves the quality of video surveillance and image analysis and, as a result, has long been a hot topic in image processing. However, current research on this topic faces a difficult challenge—effectively suppressing noise while improving brightness and maintaining color [...] Read more.

Low-light image enhancement improves the quality of video surveillance and image analysis and, as a result, has long been a hot topic in image processing. However, current research on this topic faces a difficult challenge—effectively suppressing noise while improving brightness and maintaining color consistency, especially in extremely dark scenes, where dark noise amplification, uneven exposure, and color shifts often interact, leading to detail loss and color distortion. To address the issue, we propose a dual-stage low-light enhancement framework based on the HVI (Horizontal/Vertical-Intensity) color space. The low-light image is first mapped to the HVI space, obtaining the intensity component I and the HVI-based feature map, with I being explicitly extracted as an intensity prior. A Transformer-based pre-recovery module is introduced for global dependency modeling, guided by the intensity prior I through an Intensity-Conditioned Block (ICB) for conditional feature interaction. Subsequently, a dual-branch enhancement network utilizes lightweight Complementary Cross-Attention (CCA) blocks for brightness refinement and color denoising. Finally, the enhanced image is remapped to the sRGB color space. The proposed framework decouples global brightness recovery and feature preprocessing from detail enhancement and color refinement, improving stability in extremely dark and high-noise scenarios. Through 18 quantitative and qualitative experiments, we demonstrate that our proposed method achieves superior performance in dark noise suppression and color restoration across multiple low-light datasets. Full article

► Show Figures

Figure 1

16 pages, 4072 KB

Open AccessArticle

SCGViT: A Pseudo-Multimodal Low-Latency Framework for Real-Time Skin Lesion Diagnosis

by Zirui Luo, Chengyu Hou and Haishi Wang

Electronics 2026, 15(4), 845; https://doi.org/10.3390/electronics15040845 - 16 Feb 2026

Viewed by 502

Abstract

In order to solve the problems of insufficient medical image feature extraction, high classification accuracy, and computational complexity in automatic diagnosis of skin lesions in the edge computing environment, this paper proposes a real-time pseudo-multimodal low-delay diagnosis framework, SCGViT, based on a vision [...] Read more.

In order to solve the problems of insufficient medical image feature extraction, high classification accuracy, and computational complexity in automatic diagnosis of skin lesions in the edge computing environment, this paper proposes a real-time pseudo-multimodal low-delay diagnosis framework, SCGViT, based on a vision transformer. The framework is constructed around three functional objectives: mitigating data imbalance through generative modeling, capturing diverse representations via multi-dimensional perception, and optimizing feature fusion through adaptive refinement. Firstly, using Class-Conditioned Generative Adversarial Networks (CGANs) simulates manifolds of minority class samples in latent space, achieving preliminary balance of data distribution. Secondly, a branch feature-extraction path is constructed to simulate inversion (INV) and infrared (IR) modes in the original visual primary color mode (RGB), in order to achieve multi-dimensional perception. Finally, a cross-attention mechanism is combined for cross-branch feature aggregation, and a channel-attention mechanism (squeeze and excitation) is embedded for secondary refinement of the mixed global local features to enhance the representation ability of key pathological regions by integrating complementary structural and contrast information. The experimental results on the HAM10000 dataset showed that the F1 score reached 0.973, the inference speed reached 304.439 FPS, the parameter count was only 0.524 M, and the computational complexity was only 0.866 G FLOPs, achieving a balance between high accuracy and light weight. Full article

► Show Figures

Figure 1

21 pages, 3373 KB

Open AccessArticle

A Lightweight Fire Detection Framework for Edge Visual Sensors Using Small-Sample Domain Adaptation

by Jie Hu, Ruitong Yao, Qingyuan Yang, Yuning Ding, Long Zhang and Juan Liu

Sensors 2026, 26(4), 1121; https://doi.org/10.3390/s26041121 - 9 Feb 2026

Viewed by 660

Abstract

Addressing the challenges in vision-based sensor networks, this study proposes a novel fire detection framework combining Multi-Feature Fusion and Adaptive Support Vector Machine (A-SVM). First, a high-dimensional feature vector is constructed by fusing HSI color space statistics, Local Binary Pattern (LBP) dynamic textures, [...] Read more.

Addressing the challenges in vision-based sensor networks, this study proposes a novel fire detection framework combining Multi-Feature Fusion and Adaptive Support Vector Machine (A-SVM). First, a high-dimensional feature vector is constructed by fusing HSI color space statistics, Local Binary Pattern (LBP) dynamic textures, and Wavelet Transform shape features. A baseline SVM classifier is then trained on source domain data. Second, to overcome the difficulty of acquiring labeled samples in target domains (e.g., strong daytime interference or low nighttime illumination), a small-sample domain adaptation mechanism is introduced. This mechanism fine-tunes the source model parameters using only a few labeled samples from the target domain via regularization constraints. Experimental results demonstrate that, compared with traditional color thresholding methods and unadapted baseline SVMs, the proposed method increases the F1-score by 19% and 30% in typical daytime and nighttime cross-domain scenarios, respectively. This study effectively achieves low-cost, high-precision, and robust cross-scenario fire detection, making it highly suitable for deployment on resource-constrained edge computing nodes within smart sensor networks. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

Search Results (230)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (230)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI