Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (10,970)

Search Parameters:
Keywords = image feature learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 49658 KB  
Article
Dead Chicken Identification Method Based on a Spatial-Temporal Graph Convolution Network
by Jikang Yang, Chuang Ma, Haikun Zheng, Zhenlong Wu, Xiaohuan Chao, Cheng Fang and Boyi Xiao
Animals 2026, 16(3), 368; https://doi.org/10.3390/ani16030368 (registering DOI) - 23 Jan 2026
Abstract
In intensive cage rearing systems, accurate dead hen detection remains difficult due to complex environments, severe occlusion, and the high visual similarity between dead hens and live hens in a prone posture. To address these issues, this study proposes a dead hen identification [...] Read more.
In intensive cage rearing systems, accurate dead hen detection remains difficult due to complex environments, severe occlusion, and the high visual similarity between dead hens and live hens in a prone posture. To address these issues, this study proposes a dead hen identification method based on a Spatial-Temporal Graph Convolutional Network (STGCN). Unlike conventional static image-based approaches, the proposed method introduces temporal information to enable dynamic spatial-temporal modeling of hen health states. First, a multimodal fusion algorithm is applied to visible light and thermal infrared images to strengthen multimodal feature representation. Then, an improved YOLOv7-Pose algorithm is used to extract the skeletal keypoints of individual hens, and the ByteTrack algorithm is employed for multi-object tracking. Based on these results, spatial-temporal graph-structured data of hens are constructed by integrating spatial and temporal dimensions. Finally, a spatial-temporal graph convolution model is used to identify dead hens by learning spatial-temporal dependency features from skeleton sequences. Experimental results show that the improved YOLOv7-Pose model achieves an average precision (AP) of 92.8% in keypoint detection. Based on the constructed spatial-temporal graph data, the dead hen identification model reaches an overall classification accuracy of 99.0%, with an accuracy of 98.9% for the dead hen category. These results demonstrate that the proposed method effectively reduces interference caused by feeder occlusion and ambiguous visual features. By using dynamic spatial-temporal information, the method substantially improves robustness and accuracy of dead hen detection in complex cage rearing environments, providing a new technical route for intelligent monitoring of poultry health status. Full article
(This article belongs to the Special Issue Welfare and Behavior of Laying Hens)
Show Figures

Figure 1

24 pages, 10934 KB  
Article
A Few-Shot Object Detection Framework for Remote Sensing Images Based on Adaptive Decision Boundary and Multi-Scale Feature Enhancement
by Lijiale Yang, Bangjie Li, Dongdong Guan and Deliang Xiang
Remote Sens. 2026, 18(3), 388; https://doi.org/10.3390/rs18030388 (registering DOI) - 23 Jan 2026
Abstract
Given the high cost of acquiring large-scale annotated datasets, few-shot object detection (FSOD) has emerged as an increasingly important research direction. However, existing FSOD methods face two critical challenges in remote sensing images (RSIs): (1) features of small targets within remote sensing images [...] Read more.
Given the high cost of acquiring large-scale annotated datasets, few-shot object detection (FSOD) has emerged as an increasingly important research direction. However, existing FSOD methods face two critical challenges in remote sensing images (RSIs): (1) features of small targets within remote sensing images are incompletely represented due to extremely small-scale and cluttered backgrounds, which weakens discriminability and leads to significant detection degradation; (2) unified classification boundaries fail to handle the distinct confidence distributions between well-sampled base classes and sparsely sampled novel classes, leading to ineffective knowledge transfer. To address these issues, we propose TS-FSOD, a Transfer-Stable FSOD framework with two key innovations. First, the proposed detector integrates a Feature Enhancement Module (FEM) leveraging hierarchical attention mechanisms to alleviate small target feature attenuation, and an Adaptive Fusion Unit (AFU) utilizing spatial-channel selection to strengthen target feature representations while mitigating background interference. Second, Dynamic Temperature-scaling Learnable Classifier (DTLC) employs separate learnable temperature parameters for base and novel classes, combined with difficulty-aware weighting and dynamic adjustment, to adaptively calibrate decision boundaries for stable knowledge transfer. Experiments on DIOR and NWPU VHR-10 datasets show that TS-FSOD achieves competitive or superior performance compared to state-of-the-art methods, with improvements up to 4.30% mAP, particularly excelling in 3-shot and 5-shot scenarios. Full article
16 pages, 5308 KB  
Article
Patient-Level Classification of Rotator Cuff Tears on Shoulder MRI Using an Explainable Vision Transformer Framework
by Murat Aşçı, Sergen Aşık, Ahmet Yazıcı and İrfan Okumuşer
J. Clin. Med. 2026, 15(3), 928; https://doi.org/10.3390/jcm15030928 (registering DOI) - 23 Jan 2026
Abstract
Background/Objectives: Diagnosing Rotator Cuff Tears (RCTs) via Magnetic Resonance Imaging (MRI) is clinically challenging due to complex 3D anatomy and significant interobserver variability. Traditional slice-centric Convolutional Neural Networks (CNNs) often fail to capture the necessary volumetric context for accurate grading. This study [...] Read more.
Background/Objectives: Diagnosing Rotator Cuff Tears (RCTs) via Magnetic Resonance Imaging (MRI) is clinically challenging due to complex 3D anatomy and significant interobserver variability. Traditional slice-centric Convolutional Neural Networks (CNNs) often fail to capture the necessary volumetric context for accurate grading. This study aims to develop and validate the Patient-Aware Vision Transformer (Pa-ViT), an explainable deep-learning framework designed for the automated, patient-level classification of RCTs (Normal, Partial-Thickness, and Full-Thickness). Methods: A large-scale retrospective dataset comprising 2447 T2-weighted coronal shoulder MRI examinations was utilized. The proposed Pa-ViT framework employs a Vision Transformer (ViT-Base) backbone within a Weakly-Supervised Multiple Instance Learning (MIL) paradigm to aggregate slice-level semantic features into a unified patient diagnosis. The model was trained using a weighted cross-entropy loss to address class imbalance and was benchmarked against widely used CNN architectures and traditional machine-learning classifiers. Results: The Pa-ViT model achieved a high overall accuracy of 91% and a macro-averaged F1-score of 0.91, significantly outperforming the standard VGG-16 baseline (87%). Notably, the model demonstrated superior discriminative power for the challenging Partial-Thickness Tear class (ROC AUC: 0.903). Furthermore, Attention Rollout visualizations confirmed the model’s reliance on genuine anatomical features, such as the supraspinatus footprint, rather than artifacts. Conclusions: By effectively modeling long-range dependencies, the Pa-ViT framework provides a robust alternative to traditional CNNs. It offers a clinically viable, explainable decision support tool that enhances diagnostic sensitivity, particularly for subtle partial-thickness tears. Full article
(This article belongs to the Section Orthopedics)
Show Figures

Figure 1

13 pages, 486 KB  
Review
Machine Learning-Driven Risk Prediction Models for Posthepatectomy Liver Failure: A Narrative Review
by Ioannis Margaris, Maria Papadoliopoulou, Periklis G. Foukas, Konstantinos Festas, Aphrodite Fotiadou, Apostolos E. Papalois, Nikolaos Arkadopoulos and Ioannis Hatzaras
Medicina 2026, 62(2), 237; https://doi.org/10.3390/medicina62020237 (registering DOI) - 23 Jan 2026
Abstract
Background and Objectives: Posthepatectomy liver failure (PHLF) remains a major cause of morbidity and mortality for patients undergoing major liver resections. Recent research highlights the expanding role of machine learning (ML), a crucial subfield of artificial intelligence (AI), in optimizing risk stratification. [...] Read more.
Background and Objectives: Posthepatectomy liver failure (PHLF) remains a major cause of morbidity and mortality for patients undergoing major liver resections. Recent research highlights the expanding role of machine learning (ML), a crucial subfield of artificial intelligence (AI), in optimizing risk stratification. The aim of the current study was to review, elaborate on and critically analyze the available literature regarding the use of ML-driven risk prediction models for posthepatectomy liver failure. Materials and Methods: A systematic search was conducted in the PubMed/MEDLINE, Scopus and Web of Science databases. Fifteen studies that trained and validated ML models for prediction of PHLF were further included and analyzed. Results: The available literature supports the value of ML-derived models for PHLF prediction. Perioperative clinical, laboratory and imaging features have been combined in a variety of different algorithms to provided interpretable and accurate models for identifying patients at risk of PHLF. The ML-based algorithms have consistently demonstrated high area under the curve and sensitivity values, surpassing traditionally used risk scores in predictive performance. Limitations include the small sample sizes, heterogeneity in populations included, lack of external validation and a reported poor ability to distinguish between true positive and false positive cases in several studies. Conclusions: Despite the constraints, ML-driven tools, in combination with traditional scoring systems and clinical insight, may enable early and accurate PHLF risk detection, personalized surgical planning and optimization of postoperative outcomes in liver surgery. Full article
Show Figures

Figure 1

23 pages, 3790 KB  
Article
AI-Powered Thermal Fingerprinting: Predicting PLA Tensile Strength Through Schlieren Imaging
by Mason Corey, Kyle Weber and Babak Eslami
Polymers 2026, 18(3), 307; https://doi.org/10.3390/polym18030307 (registering DOI) - 23 Jan 2026
Abstract
Fused deposition modeling (FDM) suffers from unpredictable mechanical properties in nominally identical prints. Current quality assurance relies on destructive testing or expensive post-process inspection, while existing machine learning approaches focus primarily on printing parameters rather than real-time thermal environments. The objective of this [...] Read more.
Fused deposition modeling (FDM) suffers from unpredictable mechanical properties in nominally identical prints. Current quality assurance relies on destructive testing or expensive post-process inspection, while existing machine learning approaches focus primarily on printing parameters rather than real-time thermal environments. The objective of this proof-of-concept study is to develop a low-cost, non-destructive framework for predicting tensile strength during FDM printing by directly measuring convective thermal gradients surrounding the print. To accomplish this, we introduce thermal fingerprinting: a novel non-destructive technique that combines Background-Oriented Schlieren (BOS) imaging with machine learning to predict tensile strength during printing. We captured thermal gradient fields surrounding PLA specimens (n = 30) under six controlled cooling conditions using consumer-grade equipment (Nikon D750 camera, household hairdryers) to demonstrate low-cost implementation feasibility. BOS imaging was performed at nine critical layers during printing, generating thermal gradient data that was processed into features for analysis. Our initial dual-model ensemble system successfully classified cooling conditions (100%) and showed promising correlations with tensile strength (initial 80/20 train–test validation: R2 = 0.808, MAE = 0.279 MPa). However, more rigorous cross-validation revealed the need for larger datasets to achieve robust generalization (five-fold cross-validation R2 = 0.301, MAE = 0.509 MPa), highlighting typical challenges in small-sample machine learning applications. This work represents the first successful application of Schlieren imaging to polymer additive manufacturing and establishes a methodological framework for real-time quality prediction. The demonstrated framework is directly applicable to real-time, non-contact quality assurance in FDM systems, enabling on-the-fly identification of mechanically unreliable prints in laboratory, industrial, and distributed manufacturing environments without interrupting production. Full article
(This article belongs to the Special Issue 3D/4D Printing of Polymers: Recent Advances and Applications)
43 pages, 9628 KB  
Article
Comparative Analysis of R-CNN and YOLOv8 Segmentation Features for Tomato Ripening Stage Classification and Quality Estimation
by Ali Ahmad, Jaime Lloret, Lorena Parra, Sandra Sendra and Francesco Di Gioia
Horticulturae 2026, 12(2), 127; https://doi.org/10.3390/horticulturae12020127 - 23 Jan 2026
Abstract
Accurate classification of tomato ripening stages and quality estimation is pivotal for optimizing post-harvest management and ensuring market value. This study presents a rigorous comparative analysis of morphological and colorimetric features extracted via two state-of-the-art deep learning-based instance segmentation frameworks—Mask R-CNN and YOLOv8n-seg—and [...] Read more.
Accurate classification of tomato ripening stages and quality estimation is pivotal for optimizing post-harvest management and ensuring market value. This study presents a rigorous comparative analysis of morphological and colorimetric features extracted via two state-of-the-art deep learning-based instance segmentation frameworks—Mask R-CNN and YOLOv8n-seg—and their efficacy in machine learning-driven ripening stage classification and quality prediction. Using 216 fresh-market tomato fruits across four defined ripening stages, we extracted 27 image-derived features per model, alongside 12 laboratory-measured physio-morphological traits. Multivariate analyses revealed that R-CNN features capture nuanced colorimetric and structural variations, while YOLOv8 emphasizes morphological characteristics. Machine learning classifiers trained with stratified 10-fold cross-validation achieved up to 95.3% F1-score when combining both feature sets, with R-CNN and YOLOv8 alone attaining 96.9% and 90.8% accuracy, respectively. These findings highlight a trade-off between the superior precision of R-CNN and the real-time scalability of YOLOv8. Our results demonstrate the potential of integrating complementary segmentation-derived features with laboratory metrics to enable robust, non-destructive phenotyping. This work advances the application of vision-based machine learning in precision agriculture, facilitating automated, scalable, and accurate monitoring of fruit maturity and quality. Full article
(This article belongs to the Special Issue Sustainable Practices in Smart Greenhouses)
36 pages, 3544 KB  
Article
Distinguishing a Drone from Birds Based on Trajectory Movement and Deep Learning
by Andrii Nesteruk, Valerii Nikitin, Yosyp Albrekht, Łukasz Ścisło, Damian Grela and Paweł Król
Sensors 2026, 26(3), 755; https://doi.org/10.3390/s26030755 (registering DOI) - 23 Jan 2026
Abstract
Unmanned aerial vehicles (UAVs) increasingly share low-altitude airspace with birds, making early distinguishing between drones and biological targets critical for safety and security. This work addresses long-range scenarios where objects occupy only a few pixels and appearance-based recognition becomes unreliable. We develop a [...] Read more.
Unmanned aerial vehicles (UAVs) increasingly share low-altitude airspace with birds, making early distinguishing between drones and biological targets critical for safety and security. This work addresses long-range scenarios where objects occupy only a few pixels and appearance-based recognition becomes unreliable. We develop a model-driven simulation pipeline that generates synthetic data with a controlled camera model, atmospheric background and realistic motion of three aerial target types: multicopter, fixed-wing UAV and bird. From these sequences, each track is encoded as a time series of image-plane coordinates and apparent size, and a bidirectional long short-term memory (LSTM) network is trained to classify trajectories as drone-like or bird-like. The model learns characteristic differences in smoothness, turning behavior and velocity fluctuations, and to achieve reliable separation between drone and bird motion patterns on synthetic test data. Motion-trajectory cues alone can support early distinguishing of drones from birds when visual details are scarce, providing a complementary signal to conventional image-based detection. The proposed synthetic data and sequence classification pipeline forms a reproducible testbed that can be extended with real trajectories from radar or video tracking systems and used to prototype and benchmark trajectory-based recognizers for integrated surveillance solutions. The proposed method is designed to generalize naturally to real surveillance systems, as it relies on trajectory-level motion patterns rather than appearance-based features that are sensitive to sensor quality, illumination, or weather conditions. Full article
(This article belongs to the Section Industrial Sensors)
15 pages, 8780 KB  
Article
Quantitative Analysis of Arsenic- and Sucrose-Induced Liver Collagen Remodeling Using Machine Learning on Second-Harmonic Generation Microscopy Images
by Mónica Maldonado-Terrón, Julio César Guerrero-Lara, Rodrigo Felipe-Elizarraras, C. Mateo Frausto-Avila, Jose Pablo Manriquez-Amavizca, Myrian Velasco, Zeferino Ibarra Borja, Héctor Cruz-Ramírez, Ana Leonor Rivera, Marcia Hiriart, Mario Alan Quiroz-Juárez and Alfred B. U’Ren
Cells 2026, 15(3), 214; https://doi.org/10.3390/cells15030214 (registering DOI) - 23 Jan 2026
Abstract
Non-alcoholic fatty liver disease (NAFLD) is a silent condition that can lead to fatal cirrhosis, with dietary factors playing a central role. The effect of various dietary interventions on male Wistar rats were evaluated in four diets: control, arsenic, sucrose, and arsenic–sucrose. SHG [...] Read more.
Non-alcoholic fatty liver disease (NAFLD) is a silent condition that can lead to fatal cirrhosis, with dietary factors playing a central role. The effect of various dietary interventions on male Wistar rats were evaluated in four diets: control, arsenic, sucrose, and arsenic–sucrose. SHG microscopy images from the right ventral lobe of the liver tissue were analyzed with a neural network trained to detect the presence or absence of collagen fibers, followed by the assessment of their orientation and angular distribution. Machine learning classification of SHG microscopy images revealed a marked increase in fibrosis risk with dietary interventions: <10% in controls, 24% with arsenic, 40% with sucrose, and 62% with combined arsenic–sucrose intake. Angular width distribution of collagen fibers narrowed dramatically across groups: 26° (control), 24° (arsenic), 15.7° (sucrose), and 2.8° (arsenic–sucrose). This analysis revealed four key statistical features for classifying the images according to the presence or absence of collagen fibers: (1) the percentage of pixels whose intensity is above the 15% noise threshold, (2) the Mean-to-Standard Deviation ratio (Mean/std), (3) the mode, and (4) the total intensity (sum). These results demonstrate that a diet rich in sucrose, particularly in combination with arsenic, constitutes a significant risk factor for liver collagen fiber remodeling. Full article
Show Figures

Figure 1

26 pages, 11143 KB  
Article
MISA-Net: Multi-Scale Interaction and Supervised Attention Network for Remote-Sensing Image Change Detection
by Haoyu Yin, Junzhe Wang, Shengyan Liu, Yuqi Wang, Yi Liu, Tengyue Guo and Min Xia
Remote Sens. 2026, 18(2), 376; https://doi.org/10.3390/rs18020376 (registering DOI) - 22 Jan 2026
Abstract
Change detection in remote sensing imagery plays a vital role in land use analysis, disaster assessment, and ecological monitoring. However, existing remote sensing change detection methods often lack a structured and tightly coupled interaction paradigm to jointly reconcile multi-scale representation, bi-temporal discrimination, and [...] Read more.
Change detection in remote sensing imagery plays a vital role in land use analysis, disaster assessment, and ecological monitoring. However, existing remote sensing change detection methods often lack a structured and tightly coupled interaction paradigm to jointly reconcile multi-scale representation, bi-temporal discrimination, and fine-grained boundary modeling under practical computational constraints. To address this fundamental challenge, we propose a Multi-scale Interaction and Supervised Attention Network (MISANet). To improve the model’s ability to perceive changes at multiple scales, we design a Progressive Multi-Scale Feature Fusion Module (PMFFM), which employs a progressive fusion strategy to effectively integrate multi-granular cross-scale features. To enhance the interaction between bi-temporal features, we introduce a Difference-guided Gated Attention Interaction (DGAI) module. This component leverages difference information between the two time phases and employs a gating mechanism to retain fine-grained details, thereby improving semantic consistency. Furthermore, to guide the model’s focus on change regions, we design a Supervised Attention Decoder Module (SADM). This module utilizes a channel–spatial joint attention mechanism to reweight the feature maps. In addition, a deep supervision strategy is incorporated to direct the model’s attention toward both fine-grained texture differences and high-level semantic changes during training. Experiments conducted on the LEVIR-CD, SYSU-CD, and GZ-CD datasets demonstrate the effectiveness of our method, achieving F1-scores of 91.19%, 82.25%, and 88.35%, respectively. Compared with the state-of-the-art BASNet model, MISANet achieves performance gains of 0.50% F1 and 0.85% IoU on LEVIR-CD, 2.13% F1 and 3.02% IoU on SYSU-CD, and 1.28% F1 and 2.03% IoU on GZ-CD. The proposed method demonstrates strong generalization capabilities and is applicable to various complex change detection scenarios. Full article
22 pages, 2759 KB  
Article
DACL-Net: A Dual-Branch Attention-Based CNN-LSTM Network for DOA Estimation
by Wenjie Xu and Shichao Yi
Sensors 2026, 26(2), 743; https://doi.org/10.3390/s26020743 (registering DOI) - 22 Jan 2026
Abstract
While deep learning methods are increasingly applied in the field of DOA estimation, existing approaches generally feed the real and imaginary parts of the covariance matrix directly into neural networks without optimizing the input features, which prevents classical attention mechanisms from improving accuracy. [...] Read more.
While deep learning methods are increasingly applied in the field of DOA estimation, existing approaches generally feed the real and imaginary parts of the covariance matrix directly into neural networks without optimizing the input features, which prevents classical attention mechanisms from improving accuracy. This paper proposes a spatio-temporal fusion model named DACL-Net for DOA estimation. The spatial branch applies a two-dimensional Fourier transform (2D-FT) to the covariance matrix, causing angles to appear as peaks in the magnitude spectrum. This operation transforms the original covariance matrix into a dark image with bright spots, enabling the convolutional neural network (CNN) to focus on the bright-spot components via an attention module. Additionally, a spectrum attention mechanism (SAM) is introduced to enhance the extraction of temporal features in the time branch. The model learns simultaneously from two data branches and finally outputs DOA results through a linear layer. Simulation results demonstrate that DACL-Net outperforms existing algorithms in terms of accuracy, achieving an RMSE of 0.04 at an SNR of 0 dB. Full article
(This article belongs to the Section Communications)
23 pages, 12105 KB  
Article
Fusion Framework of Remote Sensing and Electromagnetic Scattering Features of Drones for Monitoring Freighters
by Zeyang Zhou and Jun Huang
Drones 2026, 10(1), 74; https://doi.org/10.3390/drones10010074 (registering DOI) - 22 Jan 2026
Abstract
Certain types of unmanned aerial vehicles (UAVs) represent convenient platforms for remote sensing observation as well as low-altitude targets that are themselves monitored by other devices. In order to study remote sensing grayscale and radar cross-section (RCS) in an example drone, we present [...] Read more.
Certain types of unmanned aerial vehicles (UAVs) represent convenient platforms for remote sensing observation as well as low-altitude targets that are themselves monitored by other devices. In order to study remote sensing grayscale and radar cross-section (RCS) in an example drone, we present a fusion framework based on remote sensing imaging and electromagnetic scattering calculations. The results indicate that the quadcopter drone shows weak visual effects in remote sensing grayscale images while exhibiting strong dynamic electromagnetic scattering features that can exceed 29.6815 dBm2 fluctuations. The average and peak RCS of the example UAV are higher than those of the quadcopter in the given cases. The example freighter exhibits the most intuitive grayscale features and the largest RCS mean under the given observation conditions, with a peak of 51.6186 dBm2. Compared to the UAV, the small boat with a sharp bow design has similar dimensions while exhibiting lower RCS features and intuitive remote sensing grayscale. Under cross-scale conditions, grayscale imaging is beneficial for monitoring UAVs, freighters, and other nearby boats. Dynamic RCS features and grayscale local magnification are suitable for locating and recognizing drones. The established approach is effective in learning remote sensing grayscale and electromagnetic scattering features of drones used for observing freighters. Full article
24 pages, 7898 KB  
Article
Unifying Aesthetic Evaluation via Multimodal Annotation and Fine-Grained Sentiment Analysis
by Kai Liu, Hangyu Xiong, Jinyi Zhang and Min Peng
Big Data Cogn. Comput. 2026, 10(1), 37; https://doi.org/10.3390/bdcc10010037 (registering DOI) - 22 Jan 2026
Abstract
With the rapid growth of visual content, automated aesthetic evaluation has become increasingly important. However, existing research faces three key challenges: (1) the absence of datasets combining Image Aesthetic Assessment (IAA) scores and Image Aesthetic Captioning (IAC) descriptions; (2) limited integration of quantitative [...] Read more.
With the rapid growth of visual content, automated aesthetic evaluation has become increasingly important. However, existing research faces three key challenges: (1) the absence of datasets combining Image Aesthetic Assessment (IAA) scores and Image Aesthetic Captioning (IAC) descriptions; (2) limited integration of quantitative scores and qualitative text, hindering comprehensive modeling; (3) the subjective nature of aesthetics, which complicates consistent fine-grained evaluation. To tackle these issues, we propose a unified multimodal framework. To address the lack of data, we develop the Textual Aesthetic Sentiment Labeling Pipeline (TASLP) for automatic annotation and construct the Reddit Multimodal Sentiment Dataset (RMSD) with paired IAA and IAC labels. To improve annotation integration, we introduce the Aesthetic Category Sentiment Analysis (ACSA) task, which models fine-grained aesthetic attributes across modalities. To handle subjectivity, we design two models—LAGA for IAA and ACSFM for IAC—that leverage ACSA features to enhance consistency and interpretability. Experiments on RMSD and public benchmarks show that our approach alleviates data limitations and delivers competitive performance, highlighting the effectiveness of fine-grained sentiment modeling and multimodal learning in aesthetic evaluation. Full article
(This article belongs to the Special Issue Machine Learning and Image Processing: Applications and Challenges)
Show Figures

Figure 1

26 pages, 4614 KB  
Article
CHARMS: A CNN-Transformer Hybrid with Attention Regularization for MRI Super-Resolution
by Xia Li, Haicheng Sun and Tie-Qiang Li
Sensors 2026, 26(2), 738; https://doi.org/10.3390/s26020738 (registering DOI) - 22 Jan 2026
Abstract
Magnetic resonance imaging (MRI) super-resolution (SR) enables high-resolution reconstruction from low-resolution acquisitions, reducing scan time and easing hardware demands. However, most deep learning-based SR models are large and computationally heavy, limiting deployment in clinical workstations, real-time pipelines, and resource-restricted platforms such as low-field [...] Read more.
Magnetic resonance imaging (MRI) super-resolution (SR) enables high-resolution reconstruction from low-resolution acquisitions, reducing scan time and easing hardware demands. However, most deep learning-based SR models are large and computationally heavy, limiting deployment in clinical workstations, real-time pipelines, and resource-restricted platforms such as low-field and portable MRI. We introduce CHARMS, a lightweight convolutional–Transformer hybrid with attention regularization optimized for MRI SR. CHARMS employs a Reverse Residual Attention Fusion backbone for hierarchical local feature extraction, Pixel–Channel and Enhanced Spatial Attention for fine-grained feature calibration, and a Multi-Depthwise Dilated Transformer Attention block for efficient long-range dependency modeling. Novel attention regularization suppresses redundant activations, stabilizes training, and enhances generalization across contrasts and field strengths. Across IXI, Human Connectome Project Young Adult, and paired 3T/7T datasets, CHARMS (~1.9M parameters; ~30 GFLOPs for 256 × 256) surpasses leading lightweight and hybrid baselines (EDSR, PAN, W2AMSN-S, and FMEN) by 0.1–0.6 dB PSNR and up to 1% SSIM at ×2/×4 upscaling, while reducing inference time ~40%. Cross-field fine-tuning yields 7T-like reconstructions from 3T inputs with ~6 dB PSNR and 0.12 SSIM gains over native 3T. With near-real-time performance (~11 ms/slice, ~1.6–1.9 s per 3D volume on RTX 4090), CHARMS offers a compelling fidelity–efficiency balance for clinical workflows, accelerated protocols, and portable MRI. Full article
(This article belongs to the Special Issue Sensing Technologies in Digital Radiology and Image Analysis)
30 pages, 1726 KB  
Article
A Sensor-Oriented Multimodal Medical Data Acquisition and Modeling Framework for Tumor Grading and Treatment Response Analysis
by Linfeng Xie, Shanhe Xiao, Bihong Ming, Zhe Xiang, Zibo Rui, Xinyi Liu and Yan Zhan
Sensors 2026, 26(2), 737; https://doi.org/10.3390/s26020737 (registering DOI) - 22 Jan 2026
Abstract
In precision oncology research, achieving joint modeling of tumor grading and treatment response, together with interpretable mechanism analysis, based on multimodal medical imaging and clinical data remains a challenging and critical problem. From a sensing perspective, these imaging and clinical data can be [...] Read more.
In precision oncology research, achieving joint modeling of tumor grading and treatment response, together with interpretable mechanism analysis, based on multimodal medical imaging and clinical data remains a challenging and critical problem. From a sensing perspective, these imaging and clinical data can be regarded as heterogeneous sensor-derived signals acquired by medical imaging sensors and clinical monitoring systems, providing continuous and structured observations of tumor characteristics and patient states. Existing approaches typically rely on invasive pathological grading, while grading prediction and treatment response modeling are often conducted independently. Moreover, multimodal fusion procedures generally lack explicit structural constraints, which limits their practical utility in clinical decision-making. To address these issues, a grade-guided multimodal collaborative modeling framework was proposed. Built upon mature deep learning models, including 3D ResNet-18, MLP, and CNN–Transformer, tumor grading was incorporated as a weakly supervised prior into the processes of multimodal feature fusion and treatment response modeling, thereby enabling an integrated solution for non-invasive grading prediction, treatment response subtype discovery, and intrinsic mechanism interpretation. Through a grade-guided feature fusion mechanism, discriminative information that is highly correlated with tumor malignancy and treatment sensitivity is emphasized in the multimodal joint representation, while irrelevant features are suppressed to prevent interference with model learning. Within a unified framework, grading prediction and grade-conditioned treatment response modeling are jointly realized. Experimental results on real-world clinical datasets demonstrate that the proposed method achieved an accuracy of 84.6% and a kappa coefficient of 0.81 in the tumor-grading prediction task, indicating a high level of consistency with pathological grading. In the treatment response prediction task, the proposed model attained an AUC of 0.85, a precision of 0.81, and a recall of 0.79, significantly outperforming single-modality models, conventional early-fusion models, and multimodal CNN–Transformer models without grading constraints. In addition, treatment-sensitive and treatment-resistant subtypes identified under grading conditions exhibited stable and significant stratification differences in clustering consistency and survival analysis, validating the potential value of the proposed approach for clinical risk assessment and individualized treatment decision-making. Full article
(This article belongs to the Special Issue Application of Optical Imaging in Medical and Biomedical Research)
Show Figures

Figure 1

20 pages, 822 KB  
Article
Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions
by Emmanouil Karampinis, Christina-Marina Zoumpourli, Christina Kontogianni, Theofanis Arkoumanis, Dimitra Koumaki, Dimitrios Mantzaris, Konstantinos Filippakis, Maria-Myrto Papadopoulou, Melpomeni Theofili, Nkechi Anne Enechukwu, Nomtondo Amina Ouédraogo, Alexandros Katoulis, Efterpi Zafiriou and Dimitrios Sgouros
Medicina 2026, 62(1), 227; https://doi.org/10.3390/medicina62010227 (registering DOI) - 22 Jan 2026
Abstract
Background and Objectives: Dermatology relies on a complex terminology encompassing lesion types, distribution patterns, colors, and specialized sites such as hair and nails, while dermoscopy adds an additional descriptive framework, making interpretation subjective and challenging. Our study aims to evaluate the ability [...] Read more.
Background and Objectives: Dermatology relies on a complex terminology encompassing lesion types, distribution patterns, colors, and specialized sites such as hair and nails, while dermoscopy adds an additional descriptive framework, making interpretation subjective and challenging. Our study aims to evaluate the ability of a chatbot (Gemini 2) to generate dermatology descriptions across multiple languages and image types, and to assess the influence of prompt language on readability, completeness, and terminology consistency. Our research is based on the concept that non-English prompts are not mere translations of the English prompts but are independently generated texts that reflect medical and dermatological knowledge learned from non-English material used in the chatbot’s training. Materials and Methods: Five macroscopic and five dermoscopic images of common skin lesions were used. Images were uploaded to Gemini 2 with language-specific prompts requesting short paragraphs describing visible features and possible diagnoses. A total of 2400 outputs were analyzed for readability using LIX score and CLEAR (comprehensiveness, accuracy, evidence-based content, appropriateness, and relevance) assessment, while terminology consistency was evaluated via SNOMED CT mapping across English, French, German, and Greek outputs. Results: English and French descriptions were found to be harder to read and more sophisticated, while SNOMED CT mapping revealed the largest terminology mismatch in German and the smallest in French. English texts and macroscopic images achieved the highest accuracy, completeness, and readability based on CLEAR assessment, whereas dermoscopic images and non-English texts presented greater challenges. Conclusions: Overall, partial terminology inconsistencies and cross-lingual variations highlighted that the language of the prompt plays a critical role in shaping AI-generated dermatology descriptions. Full article
(This article belongs to the Special Issue Dermato-Engineering and AI Assessment in Dermatology Practice)
Show Figures

Figure 1

Back to TopTop