Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (228)

Search Parameters:
Keywords = image metadata

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1651 KB  
Article
Differential Diagnosis of Parotid Tumors on Ultrasound: Interobserver Variability and Examiner-Specific Decision Rules—A Machine Learning Approach
by Lukas Pillong, Ida Ohnesorg, Lukas Alexander Brust, Jan Palm, Julia Schulze-Berge, Victoria Bozzato, Manfred Voges, Adrian Müller, Malvina Garner and Alessandro Bozzato
Diagnostics 2026, 16(6), 880; https://doi.org/10.3390/diagnostics16060880 - 16 Mar 2026
Viewed by 263
Abstract
Background/Objectives: Noninvasive differentiation of parotid gland tumors remains challenging despite ultrasound being the primary imaging modality for salivary gland lesions. Given its examiner dependence, improving diagnostic consistency and transparency is crucial. We quantified interobserver variability in parotid ultrasound, modeled examiner-specific decision patterns using [...] Read more.
Background/Objectives: Noninvasive differentiation of parotid gland tumors remains challenging despite ultrasound being the primary imaging modality for salivary gland lesions. Given its examiner dependence, improving diagnostic consistency and transparency is crucial. We quantified interobserver variability in parotid ultrasound, modeled examiner-specific decision patterns using machine learning surrogates, and tested whether surrogate complexity relates to examiner performance. Methods: In this retrospective, single-center study, six examiners independently rated ultrasound images of 149 parotid tumors using predefined descriptors. Performance was summarized using accuracy and the area under the receiver operating characteristic curve (AUC), with 95% confidence intervals (CIs). AUCs were compared using DeLong tests (Holm-adjusted). Interobserver agreement was assessed using pairwise Cohen’s and global Fleiss’ κ. For each examiner, a decision-tree surrogate was trained from structured descriptors and clinical metadata to reproduce examiner labels and visualize decision pathways; performance was estimated by 5-fold cross-validation. Results: Examiner accuracy ranged from 63.5% to 90.5% and AUC from 0.66 to 0.89 (best 0.89, 95% CI 0.83–0.95); the best performer exceeded the two lowest performers (p < 0.001). Agreement was higher for objective descriptors (size: κ = 0.57–0.97) than for subjective descriptors (echogenicity: κ = 0.11–0.79). Surrogate decision-tree accuracy versus histopathology ranged from 57.2% to 80.0% for unpruned and from 65.1% to 76.5% for pruned models, with high coverage (95.3–98.7%). Tree complexity showed no consistent association with examiner performance. Conclusions: Parotid ultrasound shows substantial interobserver variability. Interpretable surrogates can approximate individual labeling behavior from structured descriptors and clinical metadata, making examiner-dependent decision patterns explicit. Full article
(This article belongs to the Special Issue Machine Learning for Medical Image Processing and Analysis in 2026)
Show Figures

Figure 1

23 pages, 17441 KB  
Article
A Method for Automated Crop Health Monitoring in Large Areas Using Multi-Spectral Images and Deep Convolutional Neural Networks
by Oscar Andrés Martínez, Kevin David Ortega Quiñones and German Andrés Holguin-Londoño
AgriEngineering 2026, 8(3), 109; https://doi.org/10.3390/agriengineering8030109 - 13 Mar 2026
Viewed by 273
Abstract
Crop monitoring over large land extensions represents a central challenge in precision agriculture, especially in polyculture contexts where species with different nutritional needs are combined. This study presents a methodology to manage and analyze large volumes of multispectral images captured by unmanned aerial [...] Read more.
Crop monitoring over large land extensions represents a central challenge in precision agriculture, especially in polyculture contexts where species with different nutritional needs are combined. This study presents a methodology to manage and analyze large volumes of multispectral images captured by unmanned aerial vehicles (UAVs) in order to identify and monitor crops at the plant level. The images are efficiently stored and retrieved using a Hilbert Curve, which reduces the complexity of the search process from O(n2) to O(log(n)) where n represents the number of indexed data points). The system connects to a distributed Structured Query Language (SQL) database, allowing for fast image retrieval based on GPS coordinates and other metadata. Additionally, the Normalized Difference Vegetation Index (NDVI) is calculated using reflectance data from the red and near-infrared channels, adjusted by semantic segmentation masks generated with a U-Net model, which allows for species-specific evaluations. The methodology was evaluated on a 20,000 m2 polyculture farm with coffee, avocado, and plantain crops, using a dataset of 270 aerial images partitioned into 70% for training and 30% for validation. The results show improvements in retrieval speed and precision with the Hilbert Space-Filling Curve (HSFC) approach, and an accuracy of 82.3% and an the Mean Intersection over Union (MIoU) of 68.4% in species detection with the U-Net model. Overall, this integrated framework demonstrates a scalable potential for precision agriculture in complex polyculture systems, facilitating efficient data management and targeted crop interventions. Full article
Show Figures

Figure 1

18 pages, 4935 KB  
Article
Forensic Analysis for Source Camera Identification from EXIF Metadata
by Pengpeng Yang, Chen Zhou, Daniele Baracchi, Dasara Shullani, Yaobin Zou and Alessandro Piva
J. Imaging 2026, 12(3), 110; https://doi.org/10.3390/jimaging12030110 - 4 Mar 2026
Viewed by 464
Abstract
Source camera identification on smartphones constitutes a fundamental task in multimedia forensics, providing essential support for applications such as image copyright protection, illegal content tracking, and digital evidence verification. Numerous techniques have been developed for this task over the past decades. Among existing [...] Read more.
Source camera identification on smartphones constitutes a fundamental task in multimedia forensics, providing essential support for applications such as image copyright protection, illegal content tracking, and digital evidence verification. Numerous techniques have been developed for this task over the past decades. Among existing approaches, Photo-Response Non-Uniformity (PRNU) has been widely recognized as a reliable device-specific fingerprint and has demonstrated remarkable performance in real-world applications. Nevertheless, the rapid advancement of computational photography technologies has introduced significant challenges: modern devices often exhibit anomalous behaviors under PRNU-based analysis. For instance, images captured by different devices may exhibit unexpected correlations, while images captured by the same device can vary substantially in their PRNU patterns. Current approaches are incapable of automatically exploring the underlying causes of these anomalous behaviors. To address this limitation, we propose a simple yet effective forensic analysis framework leveraging Exchangeable Image File Format (EXIF) metadata. Specifically, we represent EXIF metadata as type-aware word embeddings to preserve contextual information across tags. This design enables visual interpretation of the model’s decision-making process and provides complementary insights for identifying the anomalous behaviors observed in modern devices. Extensive experiments conducted on three public benchmark datasets demonstrate that the proposed method not only achieves state-of-the-art performance for source camera identification but also provides valuable insights into anomalous device behaviors. Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
Show Figures

Figure 1

15 pages, 4799 KB  
Article
The USGS Rotating X-Ray Computed Tomography (RXCT) Coral-Core Archive: Scope, Access, and Standardization
by Ferdinand K. J. Oberle, Lauren T. Toth, Nancy G. Prouty, Brooke Santos, Jessica A. Jacobs, Sierra Bloomer, Kian Bagheri, Breanna N. Williams, Jason S. Padgett, Anastasios Stathakopoulos and SeanPaul La Selle
J. Mar. Sci. Eng. 2026, 14(5), 490; https://doi.org/10.3390/jmse14050490 - 4 Mar 2026
Viewed by 345
Abstract
We announce the U.S. Geological Survey (USGS) Rotating X-ray Computed Tomography (RXCT) Coral-Core Archive, a digital resource derived from ~360 coral reef cores curated at the USGS Pacific and St. Petersburg Coastal and Marine Science Centers. The archive delivers calibrated 3-dimensional image volumes [...] Read more.
We announce the U.S. Geological Survey (USGS) Rotating X-ray Computed Tomography (RXCT) Coral-Core Archive, a digital resource derived from ~360 coral reef cores curated at the USGS Pacific and St. Petersburg Coastal and Marine Science Centers. The archive delivers calibrated 3-dimensional image volumes that enable reproducible values of skeletal density, linear extension, and calcification from decadal- to centennial-scale records of coral growth and bioerosion. Cross-study comparability within the archive is supported by a unified RXCT workflow that minimizes imaging artifacts. This includes rejecting image-intensity–density calibrations with r2 < 0.95, back-calculating standard densities to verify a ±10% target precision, and confirming that band-averaged density values fall within published species- and site-specific ranges. Our release of data under FAIR (Findable, Accessible, Interoperable, Reusable) principles is important given global coral reef decline and the rarity of physical coral archives. Calibrated imagery and scan metadata are distributed through CoralCache/CoralCT for analysis (DOI: 10.5194/essd-2025-598), while core locations and collection metadata are published through the USGS Geologic Core and Sample Database (DOI: 10.5066/F7319TR3) with links to CT imagery in a USGS ScienceBase repository (DOI: 10.5066/P139Y9H4). This archive provides a powerful dataset for evaluating environmental controls on coral growth, establishing restoration baselines, and improving coastal hazard assessments in the face of global coral reef declines. Full article
Show Figures

Figure 1

21 pages, 769 KB  
Article
Tabular-to-Image Encoding Methods for Melanoma Detection: A Proof-of-Concept
by Vanesa Gómez-Martínez, David Chushig-Muzo and Cristina Soguero-Ruiz
Appl. Sci. 2026, 16(5), 2459; https://doi.org/10.3390/app16052459 - 3 Mar 2026
Viewed by 278
Abstract
Deep learning (DL) models have demonstrated strong performance in dermatological applications, particularly when trained on dermoscopic images. In contrast, tabular clinical data—such as patient metadata and lesion-level descriptors—are difficult to integrate into DL-based pipelines due to their heterogeneous, non-spatial, and often low-dimensional nature. [...] Read more.
Deep learning (DL) models have demonstrated strong performance in dermatological applications, particularly when trained on dermoscopic images. In contrast, tabular clinical data—such as patient metadata and lesion-level descriptors—are difficult to integrate into DL-based pipelines due to their heterogeneous, non-spatial, and often low-dimensional nature. As a result, these data are commonly handled using separate classical machine learning (ML) models. In this work, we present a proof-of-concept study that investigates whether dermatological tabular data can be transformed into two-dimensional image representations to enable convolutional neural network (CNN)-based learning. To this end, we employ the Low Mixed-Image Generator for Tabular Data (LM-IGTD), a framework designed to transform low-dimensional and heterogeneous tabular data into two-dimensional image representations, through type-aware encoding and controlled feature augmentation. Using this approach, we encode low-dimensional clinical metadata, high-dimensional lesion-level statistical features extracted from dermoscopic images, as well as their feature-level fusion, into grayscale image representations. The resulting image representations serve as input to CNNs, and the performance is compared with ML models trained on tabular data. Experiments conducted on the Derm7pt and PH2 datasets show that traditional ML models generally achieve the highest Area Under the Curve values, while LM-IGTD-based representations provide comparable performance and enable the use of CNNs on tabular clinical data used in dermatology. Full article
(This article belongs to the Special Issue Latest Research on Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

29 pages, 3428 KB  
Article
Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis
by Diğdem Orhan, Murat Ucan, Reda Alhajj and Mehmet Kaya
Diagnostics 2026, 16(5), 734; https://doi.org/10.3390/diagnostics16050734 - 1 Mar 2026
Viewed by 319
Abstract
Background/Objectives: Early and accurate diagnosis of chest diseases is a critical challenge in clinical practice, particularly in scenarios where multiple pathologies may coexist. While deep learning-based medical image analysis has shown promising results, most existing studies rely on unimodal data and fixed-scale [...] Read more.
Background/Objectives: Early and accurate diagnosis of chest diseases is a critical challenge in clinical practice, particularly in scenarios where multiple pathologies may coexist. While deep learning-based medical image analysis has shown promising results, most existing studies rely on unimodal data and fixed-scale datasets, limiting their generalizability and clinical relevance. In this study, we present a comprehensive comparative analysis of unimodal and multimodal deep learning models for multi-label chest disease classification using chest X-ray images and associated clinical metadata. Methods: A total of twelve models were developed based on three widely used convolutional neural network architectures—ResNet50, EfficientNetB3, and DenseNet121—under both unimodal (image-only) and multimodal (image + clinical data) configurations. To systematically investigate the impact of data scale, experiments were conducted on two distinct versions: the Random Sample of NIH Chest X-ray Dataset and the NIH Chest X-ray Dataset, containing 5606 and 121,120 samples, respectively. Model performance was evaluated using label-based Area Under the Receiver Operating Characteristic Curve (AUROC) metrics. Results: Experimental results demonstrate that multimodal fusion consistently outperforms unimodal approaches across all architectures and data scales, with more pronounced improvements observed in large-scale settings. Furthermore, increasing data volume leads to improved generalization and reduced performance variance, particularly for rare pathologies. Conclusions: These findings highlight the effectiveness of multimodal, multi-label learning in enhancing diagnostic accuracy and support the development of robust clinical decision support systems for chest disease assessment. Full article
(This article belongs to the Special Issue Artificial Intelligence and Big Data in Digestive Healthcare)
Show Figures

Figure 1

29 pages, 2747 KB  
Article
Standardization of Neuromuscular Reflex Analysis—Role of Fine-Tuned Vision-Language Model Consortium and OpenAI gpt-oss Reasoning LLM-Enabled Decision Support System
by Eranga Bandara, Ross Gore, Sachin Shetty, Ravi Mukkamala, Christopher K. Rhea, Brittany S. Samulski, Amin Hass, Atmaram Yarlagadda, Shaifali Kaushik, Malith De Silva, Andriy Maznychenko, Inna Sokolowska and Kasun De Zoysa
Biomechanics 2026, 6(1), 23; https://doi.org/10.3390/biomechanics6010023 - 27 Feb 2026
Viewed by 401
Abstract
Background/Objectives: Accurate assessment of neuromuscular reflexes, such as the Hoffmann reflex (H-reflex), plays a critical role in sports science, rehabilitation, and clinical neurology. Conventional interpretation of H-reflex electromyography (EMG) waveforms is subject to inter-rater variability and interpretive bias, limiting reliability and standardization. This [...] Read more.
Background/Objectives: Accurate assessment of neuromuscular reflexes, such as the Hoffmann reflex (H-reflex), plays a critical role in sports science, rehabilitation, and clinical neurology. Conventional interpretation of H-reflex electromyography (EMG) waveforms is subject to inter-rater variability and interpretive bias, limiting reliability and standardization. This study aims to develop an automated, interpretable, and robust agentic AI–driven framework for H-reflex waveform analysis. Methods: We propose a fine-tuned Vision–Language Model (VLM) consortium combined with a reasoning Large Language Model (LLM)–enabled decision support system for automated H-reflex interpretation. Multiple VLMs were fine-tuned on curated datasets of H-reflex EMG waveform images annotated with expert clinical observations, recovery timelines, and athlete metadata. The VLM outputs were aggregated using a consensus-based strategy and further refined by a specialized reasoning LLM to ensure coherent, transparent, and explainable diagnostic assessments. Model fine-tuning employed Low-Rank Adaptation (LoRA) and 4-bit quantization to enable efficient deployment on consumer-grade hardware. Results: Experimental evaluation demonstrated that the proposed hybrid system delivers accurate, consistent, and clinically interpretable assessments of neuromuscular states, including fatigue, injury, and recovery, directly from EMG waveform images and contextual metadata. Compared with baseline models, the fine-tuned VLM consortium exhibited substantially improved precision, consistency, and contextual awareness, while the reasoning LLM enhanced diagnostic coherence through cross-model consensus and structured reasoning, thereby supporting responsible and explainable AI-driven decision making. Conclusions: This work presents, to the authors’ knowledge, the first integration of a responsible and explainable AI-driven decision support system for H-reflex analysis. The proposed framework advances the automation and standardization of neuromuscular diagnostics and establishes a foundation for next-generation AI-assisted decision support systems in sports performance monitoring, rehabilitation, and clinical neurophysiology. Full article
(This article belongs to the Special Issue Biomechanics in Sport and Ageing: Artificial Intelligence)
Show Figures

Figure 1

32 pages, 3485 KB  
Systematic Review
A Systematic Review of Available Multispectral UAV Image Datasets for Precision Agriculture Applications
by Andrea Caroppo, Giovanni Diraco and Alessandro Leone
Remote Sens. 2026, 18(4), 659; https://doi.org/10.3390/rs18040659 - 21 Feb 2026
Viewed by 621
Abstract
The proliferation of Unmanned Aerial Vehicles (UAVs) equipped with multispectral imaging sensors has revolutionized data collection in precision agriculture. These platforms provide high-resolution, temporally dense data crucial for monitoring crop health, optimizing resource management, and predicting yield. However, the development and validation of [...] Read more.
The proliferation of Unmanned Aerial Vehicles (UAVs) equipped with multispectral imaging sensors has revolutionized data collection in precision agriculture. These platforms provide high-resolution, temporally dense data crucial for monitoring crop health, optimizing resource management, and predicting yield. However, the development and validation of robust data-driven algorithms, from vegetation index analysis to complex deep learning models, are contingent upon the availability of high-quality, standardized, and publicly accessible datasets. This review systematically surveys and characterizes the current landscape of available datasets containing multispectral imagery acquired by UAVs in agricultural contexts. Following guidelines for reporting systematic reviews and meta-analyses (PRISMA methodology), 39 studies were selected and analyzed, categorizing them based on key attributes including spectral bands (e.g., RGB, Red Edge, Near-Infrared), spatial and temporal resolution, types of crops studied, presence of complementary ground-truth data (e.g., biomass, nitrogen content, yield maps), and the specific agricultural tasks they support (e.g., disease detection, weed mapping, water stress assessment). However, the review underscores a critical gap in standardization, with significant variability in data formats, annotation quality, and metadata completeness, which hampers reproducibility and comparative analysis. Furthermore, we identify a need for more datasets targeting specific challenges like early-stage disease identification and anomaly detection in complex crop canopies. Finally, we discuss future directions for the creation of more comprehensive, benchmark-ready open datasets that will be instrumental in accelerating research, fostering collaboration, and bridging the gap between algorithmic innovation and practical agricultural deployment. This work serves as a foundational guide for researchers and practitioners seeking suitable data for their work and contributes to the ongoing effort of standardizing open data practices in agricultural remote sensing. Full article
Show Figures

Figure 1

29 pages, 3439 KB  
Article
HCHS-Net: A Multimodal Handcrafted Feature and Metadata Framework for Interpretable Skin Lesion Classification
by Ahmet Solak
Biomimetics 2026, 11(2), 154; https://doi.org/10.3390/biomimetics11020154 - 19 Feb 2026
Viewed by 497
Abstract
Accurate and timely classification of skin lesions is critical for early cancer detection, yet current deep learning approaches suffer from high computational costs, limited interpretability, and poor transparency for clinical deployment. This study presents HCHS-Net, a lightweight and interpretable multimodal framework for six-class [...] Read more.
Accurate and timely classification of skin lesions is critical for early cancer detection, yet current deep learning approaches suffer from high computational costs, limited interpretability, and poor transparency for clinical deployment. This study presents HCHS-Net, a lightweight and interpretable multimodal framework for six-class skin lesion classification on the PAD-UFES-20 dataset. The proposed framework extracts a 116-dimensional visual feature vector through three complementary handcrafted modules: a Color Module employing multi-channel histogram analysis to capture chromatic diagnostic patterns, a Haralick Module deriving texture descriptors from the gray-level co-occurrence matrix (GLCM) that quantify surface characteristics correlated with malignancy, and a Shape Module encoding morphological properties via Hu moment invariants aligned with the clinical ABCD rule. The architectural design of HCHS-Net adopts a biomimetic approach by emulating the hierarchical information processing of the human visual system and the cognitive diagnostic workflows of expert dermatologists. Unlike conventional black-box deep learning models, this framework employs parallel processing branches that simulate the selective attention mechanisms of the human eye by focusing on biologically significant visual cues such as chromatic variance, textural entropy, and morphological asymmetry. These visual features are concatenated with a 12-dimensional clinical metadata vector encompassing patient demographics and lesion characteristics, yielding a compact 128-dimensional multimodal representation. Classification is performed through an ensemble of three gradient boosting algorithms (XGBoost, LightGBM, CatBoost) with majority voting. HCHS-Net achieves 97.76% classification accuracy with only 0.25 M parameters, outperforming deep learning baselines, including VGG-16 (94.60%), ResNet-50 (94.80%), and EfficientNet-B2 (95.16%), which require 60–97× more parameters. The framework delivers an inference time of 0.11 ms per image, enabling real-time classification on standard CPUs without GPU acceleration. Ablation analysis confirms the complementary contribution of each feature module, with metadata integration providing a 2.53% accuracy gain. The model achieves perfect melanoma and nevus recall (100%) with 99.55% specificity, maintaining reliable discrimination at safety-critical diagnostic boundaries. Comprehensive benchmarking against 13 published methods demonstrates that domain-informed handcrafted features combined with clinical metadata can match or exceed deep learning fusion approaches while offering superior interpretability and computational efficiency for point-of-care deployment. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

20 pages, 313 KB  
Article
Making the Child Legible: Children’s Literature as Archive and Agent in Central Europe, 1860–2025
by Milan Mašát
Histories 2026, 6(1), 18; https://doi.org/10.3390/histories6010018 - 19 Feb 2026
Viewed by 394
Abstract
Central European children’s literature can be read as both archive—recording shifting norms, institutions, and visual regimes—and agent, a medium through which childhood, citizenship, and cultural memory are made legible. This conceptual article proposes an edition-sensitive framework for analysing texts, images, and paratexts across [...] Read more.
Central European children’s literature can be read as both archive—recording shifting norms, institutions, and visual regimes—and agent, a medium through which childhood, citizenship, and cultural memory are made legible. This conceptual article proposes an edition-sensitive framework for analysing texts, images, and paratexts across Central Europe (1860–2025), with particular attention to institutional mediation. Rather than offering a comprehensive dataset or causal claims about reception, it synthesises research in childhood history, book and media history, memory studies, and translation and circulation studies to advance three arguments. First, children’s books are institutionally framed artefacts: paratexts and material features (series branding, curricular endorsements, library markings, pricing cues, regulatory traces) can be read as historically interpretable speech acts of legitimation. Second, shifts in visual and material regimes should be analysed as changing conditions of legibility—expectations of clarity, affect, and authority—rather than as mere stylistic evolution. Third, translation and circulation function as infrastructures that reorganise repertoires and interpretive horizons, complicating nation-centred narratives without exhaustive market mapping. The article concludes by stating methodological limits (catalogue gaps, survival bias, uneven metadata) and outlining a transferable agenda for paratext-centred documentation and edition-sensitive reading. Full article
(This article belongs to the Section Cultural History)
19 pages, 636 KB  
Article
Transferring AI-Based Iconclass Classification Across Image Traditions: A RAG Pipeline for the Wenzelsbibel
by Drew B. Thomas and Julia Hintersteiner
Histories 2026, 6(1), 17; https://doi.org/10.3390/histories6010017 - 18 Feb 2026
Viewed by 419
Abstract
This study evaluates whether a multimodal retrieval-augmented generation (RAG) pipeline originally developed for early modern woodcuts can be effectively transferred to the domain of medieval manuscript illumination. Using a dataset of Wenzelsbibel miniatures annotated with Iconclass, the pipeline combined page-level image input, LLM [...] Read more.
This study evaluates whether a multimodal retrieval-augmented generation (RAG) pipeline originally developed for early modern woodcuts can be effectively transferred to the domain of medieval manuscript illumination. Using a dataset of Wenzelsbibel miniatures annotated with Iconclass, the pipeline combined page-level image input, LLM description generation, vector retrieval, and hierarchical reasoning. Although overall scores were lower than in the earlier woodcut study, the best-performing configuration still substantially surpassed both image-similarity and keyword-based search, confirming the advantages of structured multimodal retrieval for medieval material. Truncation analysis further revealed that many errors occurred only at the deepest Iconclass levels: removing levels raised precision to 0.64 and 0.73, with average remaining depths of 5.49 and 4.49 levels, respectively. These results indicate that the model’s broader hierarchical placement is often correct even when fine-grained specificity breaks down. Taken together, the findings demonstrate that a woodcut-oriented RAG pipeline can be meaningfully adapted to manuscript illumination and that its strengths lie in contextual reasoning and structured classification. Future improvements should incorporate available textual metadata, explore graph-based retrieval, and refine Iconclass-driven pathways. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Historical Research)
Show Figures

Figure 1

24 pages, 6624 KB  
Article
Application of Computer Vision to the Automated Extraction of Metadata from Natural History Specimen Labels: A Case Study on Herbarium Specimens
by Jacopo Zacchigna, Weiwei Liu, Felice Andrea Pellegrino, Adriano Peron, Francesco Roma-Marzio, Lorenzo Peruzzi and Stefano Martellos
Plants 2026, 15(4), 637; https://doi.org/10.3390/plants15040637 - 17 Feb 2026
Viewed by 818
Abstract
Metadata extraction from natural history collection labels is a pivotal task for the online publication of digitized specimens. However, given the scale of these collections—which are estimated to host more than 2 billion specimens worldwide, including ca. 400 million herbarium specimens—manual metadata extraction [...] Read more.
Metadata extraction from natural history collection labels is a pivotal task for the online publication of digitized specimens. However, given the scale of these collections—which are estimated to host more than 2 billion specimens worldwide, including ca. 400 million herbarium specimens—manual metadata extraction is an extremely time-consuming task. Thus, automated data extraction from digital images of specimens and their labels therefore is a promising application of state-of-the-art computer vision techniques. Extracting information from herbarium specimen labels normally involves three main steps: text segmentation, multilingual and handwriting recognition, and data parsing. The primary bottleneck in this workflow lies in the limitations of Optical Character Recognition (OCR) systems. This study explores how the general knowledge embedded in multimodal Transformer models can be transferred to the specific task of herbarium specimen label digitization. The final goal is to develop an easy-to-use, end-to-end solution to mitigate the limitations of classic OCR approaches while offering greater flexibility to adapt to different label formats. Donut-base, a pre-trained visual document understanding (VDU) transformer, was the base model selected for fine-tuning. A dataset from the University of Pisa served as a test bed. The initial attempt achieved an accuracy of 85%, measured using the Tree Edit Distance (TED), demonstrating the feasibility of fine-tuning for this task. Cases with low accuracies were also investigated to identify limitations of the approach. In particular, specimens with multiple labels, especially if combining handwritten and typewritten text, proved to be the most challenging. Strategies aimed at addressing these weaknesses are discussed. Full article
Show Figures

Figure 1

24 pages, 4094 KB  
Article
MMY-Net: A BERT-Enhanced Y-Shaped Network for Multimodal Pathological Image Segmentation Using Patient Metadata
by Ahmed Muhammad Rehan, Kun Li and Ping Chen
Electronics 2026, 15(4), 815; https://doi.org/10.3390/electronics15040815 - 13 Feb 2026
Viewed by 240
Abstract
Medical image segmentation, particularly for pathological diagnosis, faces challenges in leveraging patient clinical metadata that could enhance diagnostic accuracy. This study presents MMY-Net (Multimodal Y-shaped Network), a novel deep learning framework that effectively fuses patient metadata with pathological images for improved tumor segmentation [...] Read more.
Medical image segmentation, particularly for pathological diagnosis, faces challenges in leveraging patient clinical metadata that could enhance diagnostic accuracy. This study presents MMY-Net (Multimodal Y-shaped Network), a novel deep learning framework that effectively fuses patient metadata with pathological images for improved tumor segmentation performance. The proposed architecture incorporates a Text Processing Block (TPB) utilizing BERT for metadata feature extraction and a Text Encoding Block (TEB) for multi-scale fusion of textual and visual information. The network employs an Interlaced Sparse Self-Attention (ISSA) mechanism to capture both local and global dependencies while maintaining computational efficiency. Experiments were conducted on two open/public eyelid tumor datasets (Dataset 1: 112 WSIs for training/validation; Dataset 2: 107 WSIs as an independent test set) and the public Dataset 3 gland segmentation benchmark. For Dataset 1, 7989 H&E-stained patches (1024 × 1024, resized to 224 × 224) were extracted and split 7:2:1 (train:val:test); Dataset 2 was used exclusively for external validation. All images underwent Vahadane stain normalization. Training employed SGD (lr = 0.001), 1000 epochs, and a hybrid loss (cross-entropy + MS-SSIM + Lovász). Results show that integrating metadata—such as age and gender—significantly improves segmentation accuracy, even when metadata does not directly describe tumor characteristics. Ablation studies confirm the superiority of the proposed text feature extraction and fusion strategy. MMY-Net achieves state-of-the-art performance across all datasets, establishing a generalizable framework for multimodal medical image analysis. Full article
(This article belongs to the Section Electronic Materials, Devices and Applications)
Show Figures

Figure 1

31 pages, 11526 KB  
Review
Transferability and Robustness in Proximal and UAV Crop Imaging
by Jayme Garcia Arnal Barbedo
Agronomy 2026, 16(3), 364; https://doi.org/10.3390/agronomy16030364 - 2 Feb 2026
Viewed by 390
Abstract
AI-driven imaging is becoming central to crop monitoring, with proximal and unmanned aerial vehicle (UAV) platforms now routinely used for disease and stress detection, yield estimation, canopy structure, and fruit counting. Yet, as these models move from plots to farms, the main bottleneck [...] Read more.
AI-driven imaging is becoming central to crop monitoring, with proximal and unmanned aerial vehicle (UAV) platforms now routinely used for disease and stress detection, yield estimation, canopy structure, and fruit counting. Yet, as these models move from plots to farms, the main bottleneck is no longer raw accuracy but robustness under distribution shift. Systems trained in one field, season, cultivar, or sensor often fail when the scene, sensor, protocol, or timing changes in realistic ways. This review synthesizes recent advances on robustness and transferability in proximal and UAV imaging, drawing on a corpus of 42 core studies across field crops, orchards, greenhouse environments, and multi-platform phenotyping. Shift types are organized into four axes, namely scene, sensor, protocol, and time. The article also maps the empirical evidence on when RGB imaging alone is sufficient and when multispectral, hyperspectral, or thermal modalities can potentially improve robustness. This serves as a basis to synthesize acquisition and evaluation practices that often matter more than architectural tweaks, which include phenology-aware flight planning, radiometric standardization, metadata logging, and leave-one-field/season-out splits. Adaptation options are consolidated into a practical symptom/remedy roadmap, ranging from lightweight normalization and small target-set fine-tuning to feature alignment, unsupervised domain adaptation, style translation, and test-time updates. Finally, a benchmark and dataset agenda are outlined with emphasis on object-oriented splits, cross-sensor and cross-scale collections, and longitudinal datasets where the same fields are followed across seasons under different management regimes. The goal is to outline practices and evaluation protocols that support progress toward deployable and auditable systems, noting that such claims require standardized out-of-distribution testing and transparent reporting as emphasized in the benchmark specification and experiment suite proposed here. Full article
Show Figures

Figure 1

10 pages, 1516 KB  
Data Descriptor
Multiplex Immunofluorescence and Histopathology Dataset of Cell Cycle–Related Proteins in Renal Cell Carcinoma
by Hazem Abdullah, In Hwa Um, Grant D. Stewart, Alexander Laird, Kathryn Kirkwood, Chang Wook Jeong, Cheol Kwak, Kyung Chul Moon, TranSORCE Team, Tim Eisen, Elena Frangou, Anne Warren, Angela Meade and David J. Harrison
Data 2026, 11(2), 27; https://doi.org/10.3390/data11020027 - 1 Feb 2026
Viewed by 621
Abstract
Clear-cell renal cell carcinoma (ccRCC) accounts for the majority of kidney cancer diagnoses and exhibits widely variable clinical behaviour. The dataset described here was generated to support the discovery of robust biomarkers of tumour cell-cycle arrest and to inform the risk-stratified management of [...] Read more.
Clear-cell renal cell carcinoma (ccRCC) accounts for the majority of kidney cancer diagnoses and exhibits widely variable clinical behaviour. The dataset described here was generated to support the discovery of robust biomarkers of tumour cell-cycle arrest and to inform the risk-stratified management of ccRCC. We assembled four independent cohorts including 480 patients from the UK arm of the SORCE adjuvant trial, 300 patients from a surgically treated series in Korea, 120 patients from a retrospective Scottish cohort, and a paired primary–metastatic cohort comprising 62 patients. Formalin-fixed paraffin-embedded nephrectomy specimens were processed for routine hematoxylin and eosin (H&E) histology, and for multiplex immunofluorescence (mIF). The mIF panels detect the cyclin-dependent kinase inhibitor p21CDKN1a, the DNA replication licencing factor MCM2, endoglin/CD105, Lamin B1 and nuclear DNA (Hoechst). Whole-slide images (WSIs) were acquired at high resolution, and artificial-intelligence pipelines were used to segment nuclei, classify individual cells into arrested phenotypes, and calculate the fraction of cells. Accompanying metadata include demographics, tumour stage, grade, Leibovich score, treatment arm (sorafenib/placebo), relapse events, and disease-free survival. All images and derived tables are released under a CC0 licence via the BioImage Archive, ensuring unrestricted reuse. This multi-cohort dataset provides a rich resource for studying cell-cycle arrest and proliferation markers, training image-analysis algorithms, and developing prognostic signatures in RCC. Full article
Show Figures

Figure 1

Back to TopTop